[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML-appropriate editing data structures
I think there are some easy wins to be had. Like others have said the nature of the document will drive a lot of the approaches-- but more importantly the nature of the grammar will drive the approaches. For instance-- working with DTDs in an editor makes offering up allowed items at a given position child's play. Because DTDs have limited notions of context (i.e., one global context) it makes determining allowed entities, elements, attributes, attribute values, a snap. (1) Parse the grammar. In my editor I cache a variety of (configurable) common DTDs. Store the results in arrays/lists for quick access. This parsing is done with a modified SAX parser and validator-- but only barely modified. (2) On Invoke of the completion tool I reverse parse to get: a) current desired context (e.g., attribute, attribute value, start element, end element) b) current document context (e.g., previous siblings, previous attributes in the element, parent element) (3) Because DTDs offer a single global context for element declarations-- knowing the parent element solves the problem-- determining the next allowable element or elements is fairly simple and very localized. Now, with that said-- I am not selling my editor so there were some shortcuts I took. For instance the backwards parsing algorithm isn't perfect by any stretch of the imagination. Also, I only parse backward on Invoke-- so attributes used after the current cursor position appear in the list. Additionally, there are pathalogical cases-- e.g., an XML document with one root node-- and thousands upon thousands of children nodes that would be treated as siblings. If I were to handle this, I would probably do a better job attempting to predetermine the structure possiblities in the grammar and create an internal list of "break" cases. An example of this would be an element foo which had the content model (BarA, BarB*, BarC). If it is possible to determine that BarB is only referenced in this declaration then the parent context is not needed-- anytime a preceding sibling is BarB, the immediate determination that BarB | BarC is allowed can be inferred. Unfortunately, with modern schema systems there are a number of new problems. Notably, namespaces. But additional problems are created by the introduction of multiple contexts for element names-- in which case a single parent context is not satisfactory. For XML Schema, a list of predefined "break" cases is imperative-- but the range of pathological documents increases. In terms of trees-- I have one that can be turned off. It is always off for me. It is not that good : ) In terms of parsing and validation I haven't spent much time here. I simply have a delay built in-- if the user doesn't move the cursor for X seconds it does a SAX based parse and Validation if the features are turned on. All of this is a separate thread so that it can be immediately interrupted. A far better solution would be to combine this a detection of changes-- e.g., if they typed comment data that didn't have "--" there is no reason to re-validate. I imagine, maybe wrongly, that there is some elegant solution of a merged validation based on diffs. But I suspect most editors simplify this a little and validate a local context determining the context in a way that is similar to the above completion proposal strategy. Of course, similar to everyone else I am eager to here new solutions or critiques. Also, I am willing to share my reverse parsing algorithm (in Pascal no less)-- of course with the attached caveat that it would probably need work before it was ready for prime time. Cheers, Jeff Rafter
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|