[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Profiling, diff and change tracking best practices?

  • From: Lech Rzedzicki <xchaotic@gmail.com>
  • To: xml-dev@lists.xml.org
  • Date: Thu, 1 Oct 2009 17:09:35 +0100

Re:  Profiling
On Thu, Oct 1, 2009 at 4:40 PM, michael odling-smee
<mike.odlingsmee@gmail.com> wrote:
>
> Funnily enough I have just started thinking about this for my own project
> with a similar use-case - i.e. understanding the changes between two
> different baselines of an XML document or XML document set.

Great to hear that - I was expecting just that - it is a common
fallacy in the computer world that developers do reinvent the wheel,
while all you need to do is a bit of google-fu and creative
discussion.
>
> My high-level thoughts so far are:
>
> 1.] Add suitable meta-data attributes (e.g. version/create and modify
> date/author) to fairly coarse grained components within the XML data model.

On a bit lower level, have you already though what would be a
complete-enough set of metadata that fits your requirements? I have
tried to follow the Dublin Core model, but it might be overly complex
for your purposes...
- Show quoted text -
> 2.] Create a baseline of the document or set of XML documents set by:
> 2.1] Creating a fairly light weight XML file (perhaps using XSLT) that only
> contains this meta-data. Save this to disk (i.e. create a memento of the
> meta-data)
> 2.2] Saving a copy of the original XML in a version control system/file
> system where it will not be edited further.
> 3.] Later on when trying to do a diff. between the original baseline and
> current:
> 3.1] Using the same mechanism as in step 2.1 create a new memento of the
> current XML document or set of XML documents
> 4.] Compare the two mementos reporting on changes - if required the baseline
> copy of the XML can be used to compute exactly what content has changed (I
> think you need add/delete and update) between the two versions.
>
> I am still undecided whether both the memento and document copy are required
> - logically the memento is not actually required. However the lightweight
> memento may prove useful if:
>
> The XML document or set of documents is very large such that it would not be
> desirable to store a complete copy of the document(s).
> To aid with deep differencing optimisation (especially relevant where there
> is a set of XML documents that you are comparing so you only have to parse
> files where differences occur).
> The diff. report is only meant to identify where differences are not what
> they are.
>
> Anyway I have only had early thoughts on the subject so would glady listen
> to any other suggestions that the community has to offer.

Sounds like a neat approach, but just like you, my initial feeling is
that separation of the metadata is an awkward thing to do indeed and
might make processing a bit too complex - after all to create a simple
delta document, you would need to compare the two mementos then go
back to the original files and locate the changes, I agree that it
might be necessary when dealing with large documents, but in such
cases, I suppose you could aplly stream processing like SAX instead,
especially for comparing things...

I don't know if that's the case in your environment, but in my
scenario, the raw XML is going to be maintained by people, so I am
striving for simplicity. The separation of metadata, like you propose
might mean a bit more complex processing, but the XML that people see,
could in effect be more managable, so I'll certainly have a think
about it...

Lech


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.