[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Profiling, diff and change tracking best practices?
On Thu, Oct 1, 2009 at 4:40 PM, michael odling-smee <mike.odlingsmee@gmail.com> wrote: > > Funnily enough I have just started thinking about this for my own project > with a similar use-case - i.e. understanding the changes between two > different baselines of an XML document or XML document set. Great to hear that - I was expecting just that - it is a common fallacy in the computer world that developers do reinvent the wheel, while all you need to do is a bit of google-fu and creative discussion. > > My high-level thoughts so far are: > > 1.] Add suitable meta-data attributes (e.g. version/create and modify > date/author) to fairly coarse grained components within the XML data model. On a bit lower level, have you already though what would be a complete-enough set of metadata that fits your requirements? I have tried to follow the Dublin Core model, but it might be overly complex for your purposes... - Show quoted text - > 2.] Create a baseline of the document or set of XML documents set by: > 2.1] Creating a fairly light weight XML file (perhaps using XSLT) that only > contains this meta-data. Save this to disk (i.e. create a memento of the > meta-data) > 2.2] Saving a copy of the original XML in a version control system/file > system where it will not be edited further. > 3.] Later on when trying to do a diff. between the original baseline and > current: > 3.1] Using the same mechanism as in step 2.1 create a new memento of the > current XML document or set of XML documents > 4.] Compare the two mementos reporting on changes - if required the baseline > copy of the XML can be used to compute exactly what content has changed (I > think you need add/delete and update) between the two versions. > > I am still undecided whether both the memento and document copy are required > - logically the memento is not actually required. However the lightweight > memento may prove useful if: > > The XML document or set of documents is very large such that it would not be > desirable to store a complete copy of the document(s). > To aid with deep differencing optimisation (especially relevant where there > is a set of XML documents that you are comparing so you only have to parse > files where differences occur). > The diff. report is only meant to identify where differences are not what > they are. > > Anyway I have only had early thoughts on the subject so would glady listen > to any other suggestions that the community has to offer. Sounds like a neat approach, but just like you, my initial feeling is that separation of the metadata is an awkward thing to do indeed and might make processing a bit too complex - after all to create a simple delta document, you would need to compare the two mementos then go back to the original files and locate the changes, I agree that it might be necessary when dealing with large documents, but in such cases, I suppose you could aplly stream processing like SAX instead, especially for comparing things... I don't know if that's the case in your environment, but in my scenario, the raw XML is going to be maintained by people, so I am striving for simplicity. The separation of metadata, like you propose might mean a bit more complex processing, but the XML that people see, could in effect be more managable, so I'll certainly have a think about it... Lech
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|