[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Why the Infoset?
>John Cowan wrote: >> Character references are lost, it is true. >> If you want them back, shout now. > At 21:56 01/08/00 +0800, Rick JELLIFFE wrote: >Can I shout the opposite: "the fact that a character was entered >directly or by reference should not be information available for any >other specification or general-purpose application: it should not be >part of the infoset." > >This is because the use of character references should be determined by >its availability in the encoding used (and any user-supplied "kernel" >encoding within that). XML should be defined using Unicode characters, >not the markup that achieved the character. > Can I shout the opposite to this opposite! This is a good case in point where the in/not-in dualism of the OTI (One True Infoset) approach falls down. If character references are not in the infoset then it is impossible to write an XML parser based app that processes them. The only way to process them would be to do so *lexically*. In shifting to a lexical based algorithm you would need to basically *re-write* an XML parser in order to be sure that you were identifying character entity references correctly every time. Oh, sure you can write a regexp that will work "most of the time" but try tell that to the client of the m-commerce/healthcare/rocket launching XML application your are building. regards, Sean http://www.pyxie.org - an Open Source XML Processing library for Python
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|