[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Saxon and Sun Serializer problems?
> I would be interested in the "exact reason" why this has > happened in the specs. As well, I would like to be > instructed as to why my concern is obscure or misdirected. You'll find an essay on the subject in the section on data model in my XSLT reference book. The core reason is that the base XML spec (very deliberately, I believe, but mistakenly in my view) confined itself almost entirely to saying what constituted valid XML syntax, and saying almost nothing about the information payload of an XML document. Apart from a few hints (for example that the order of attributes is insignificant, or that a minimised tag <a/> is informationally equivalent to <a></a>) it gives no definitive statement about what distinctions are meaningful and what aren't. You can make guesses, and in some cases everyone would agree with you (for example, that the spaces around the equals sign in an attribute are immaterial), and in other cases not everyone would agree (for example, CDATA sections). So it was left to other people to make their own definitions, and they came out different. You can then ask questions about why individual decisions were made, for example why the XSLT/XPath data model ignores DOCTYPE. I suspect a good part of the answer to that particular questions is that at the time, DTDs were perceived as obsolescent - they were a stop-gap measure provided until XML Schema was available. (That also explains, I think, why DTDs were never made namespace-aware.) > > In the meantime, on a practical level the Trax API in java is > based on SAX and handles both the LexicalHandler and the > ContentHandler. Actually, the identity transformer recognizes three representations: lexical XML, SAX events, and DOM trees, and allows any of these three to be transformed into any other. There's no special recognition of any one of them, and no guidance as to what should be preserved (e.g. entity references) in the conversion. Since the Saxon engine is essentially based on the XDM model, it converts between these three representations via the XDM model, and there's certainly nothing in the JAXP spec that says that isn't a reasonable interpretation. Another equally valid interpretation would be to go via canonical XML, but the spec doesn't mandate one or the other. > > Since XSLT is a "pass through" kind of technology I don't see > the sense of being lossy. From that you could argue that it should preserve the order of attributes and the whitespace around the equals signs. The word "lossy" can only be interpreted relative to some information model. Regards, Michael Kay http://www.saxonica.com/ http://twitter.com/michaelhkay
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|