|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML processing experiments
James Clark wrote: >> Given XML's requirements that entity references in the instance are >> synchronous, I would have thought that the overhead of an entity stack >> could be avoided for parsing the instance. The parser passes the >> application an entity reference event, and the application can then, if >> it chooses, recursively invoke the parser to parse the referenced >> entity. Richard Tobin wrote: >Entity references are expanded, and a bit may end in a different >entity from the one it started in (suppose foo is defined as "a<b/>c"; >then the first bit returned from "x&foo;y" is "xa" - as far as I can >tell this is quite legal XML). I don't think this is legal. The working draft (sec. 4.1) says: "The logical and physical structures (elements and entities) in an XML document must be synchronous. Tags and elements must each begin and end in the same entity, but may refer to other entities internally; comments, processing instructions, character references, and entity references must each be contained entirely within a single entity" It seems to me that with the current whitespace handling, one could nearly (?) parse the entities locally, and build a subtree of it if the tree is wanted. (This could maybe result in easier error-reporting, and would probably have a positive impact on parsing speed (but could mean a bit more complexity in the implementation?)) As Mr. Clark indicates, a parser doesn't need to take much of a performance hit when entities are not present, the entity stack have no influence (is kept constant) when parsing f.i. a start-tag. (if entity references are present in the attribute values, this can be expanded afterwards if wanted. Authoring tools etc often don't want this expansion to happen.) I (currently!) think it is possible to design a 'real' parser looking locally much the same as Mr. Clark's "quick and dirty" parser. (I'm in the startup implementing one) BTW: Anyone having an example of where the immediate expansion of character references within internal entities actually comes handy? To me this seems to make the parser use more memory and perhaps being slower, but more importantly: ruins copy-paste semantics of entity expansion What will "normal" people think about such things as the example from the draft: <!ENTITY example "<p>An ampersand (&#38;) may be escaped numerically (&#38;#38;) or with a general entity (&amp;).</p>" > I think most people will regard this as a bug/design flaw. I would feel better if I knew an example where this behaviour actually comes handy... :-) Cheers, Jarle Stabell xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








