|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: SAX needs from our point of view
Michael Amster wrote: > Quoting Ray Cromwell: > > >Ok, now that I've started a flame war and gotten that off my chest :), > >I'd like to nominate the three biggest features I'd like in SAX Level 2 > >(or SAX2.0), in order of importance. > >1) access to DTD information > >2) comments, CDATA, and location information for Attributes > >3) sax.util classes that take an ElementFactory (which return DOM > >interfaces), and build a tree. (maybe Don Park would like to contribute > >this). IBM's XML for Java is a starting point, but it has the fatal flaw > >that the return values of the ElementFactory are not the DOM interfaces > >(such as Element or PI) but IBM base classes, like TXElement or PI, > >which means you are forced to inherit from TXElement instead of just > >implementing Element. > > In our case, having embedded XML languages with our own language > controlling flow of execution, we have a real need for an accurate > reproduction of the XML elements parsed so they can be rewritten correctly. > Specifically, the issue is important in distinguishing between text and > CDATA. Let me illustrate with a simple example: > > <WEIF COND="true"> > <WETHEN> > <ARBITRARYXML/> > <![CDATA[ > This is data with &references; which should not be parsed! > ]]> > <MOREXML> > This is just text > </MOREXML> > </WETHEN> > </WEIF> > > When this is reported up from a SAX parser, we do not differentiate between > text and the CDATA, but let's say that we want to output the subset of > arbitrary XML back out from our DOM or other object structure: > > <ARBITRARYXML/> > This is data with &references; which should not be parsed! > <MOREXML> > This is just text > </MOREXML> > > Now you see that the CDATA will have all references made when it is > reparsed. We really do want to preserve CDATA as different from text in > SAX. I can live without comments and to some degree, I can even reduce the > amount of DTD info available to me, but I hope that CDATA and text are > reported differently through the interface. It should not substantially > complicate things for parser writers or application developers if it is > just a Document handler event. > > -MA The solution I have found for the XMLReader (formatter) I have been working on is to scan each string of character content for any characters that need to be escaped with a CDATA section and embed that content in a CDATA section. This operation algorithmically is sort of expensive, but for the content I have had to format, the formatting process is still 5-10 times faster than the parsing process. Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








