[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] XML-appropriate editing data structures
Recent criticisms of some Eclipse-based XML editors (including mine) (in part) because they use a lot of memory relative to file size underline the fairly obvious fact that XML files are often much larger than programming language files. When the techniques used successfully for programming languages are applied to XML, they can break down. The first person I ever saw address this issue directly was Bryan Ford, in his packrat parsing paper (http://www.brynosaurus.com/pub/lang/packrat-icfp02.pdf). Packrat parsing requires an O(n), where n is the document size, data structure with a rather large constant factor. Ford observes "For example, for parsing XML streams, which have a fairly simple structure but often encode large amounts of relatively flat, machine-generated data, the power and flexibility of packrat parsing is not needed and its storage cost would not be justified." However, the expectations of a modern XML editor are set by the features of modern programming language editors: 1) Syntax coloring. Coloring implies context (the string 'abc' is colored differently if it is an attribute name vs. attribute value vs. element name vs. PI name, etc.); context implies parsing. Coloring is particularly demanding in that it must be done in real time in the foreground while the user is editing after each user action and before characters are echoed to the display. 2) Outline view. Every practical XML editor offers both a text and an outline view; some allow editing of both views and most allow the views to be seen simultaneously, which in practice means one view must catch up to the other after a relatively brief delay. For XML, the outline view is essentially a DOM view with some node types possibly elided. 3) Content assist. Most commercial-quality XML editors derive content assist for element names, attribute names, element and attribute contents, entities, etc. from DTDs and/or schemas. This means that a) the DTD or schema must be parsed before any assistance is available, and b) the DTD or schema must be resolved to an in-memory data structure that drives assistance. This data structure is inherently O(g) where g is the grammar size; I have seen a number of them and I have yet to see one designed to be compact. 4) Validation. Much the same considerations apply as for content assist, with the additional constraint that validation is expected to be of very high quality. It is easy to come up with a data structure that could drive both validation and content assist, but it is very hard to write a decent validator (esp. for XML Schema) and another kind of problem to re-use the data structures of existing decent validators, most of which were not designed for external use, for code assist. 5) Graphical view. If the document under edit is a DTD or schema, a graphical view is often provided that shows the logical structure of the grammar (as opposed to that of the document). Editing the graphical view is often allowed, resulting in the need to update other open views (text or outline) of the same document. (Though, in fact, the graphical view is inherently a multi-document editor.) 6) Open definition, show references, refactor/rename. These are actions applied to a document, e.g., to an element name or definition, that suggest the need for a multi-document data structure that, at a minimum, exposes the knowable dependency relationships between documents (though one could brute-force search all known documents on demand, performance is likely to suffer). These relationships are often not manifest in a document under edit. Each of these requirements can be addressed by a data structure and each of the data structures has an analog used by programming language editors. But if you poke under the covers of programming language editors you often find that memory overhead was not a major design factor, because most program language files are fairly small. Consequently a XML editor that uses the same techniques to address the requirements above will be judged 'not ready for prime time' when it is applied to extra-large (or exceptionally squirrely) documents, DTDs or schemas. If you think addressing these needs with no memory overhead is a trivial weekend project, feel free to show us your editor. In the meantime, I'd be happy to discuss implementation techniques that might make some or all of this faster/smaller all day long, on or off the list. Bob Foster http://xmlbuddy.com/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|