[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: carriage return handling in XML parsers
> I am trying to understand the implications of what the > XML 1.0 spec says about End-of-Line Handling and would > appreciate some clarification from more experienced > shoulders. > > It would appear that given this section, it is never possible > to get unaccompanied carriage return "characters" in the > stream of information provided by an XML parser, be it > SAX or DOM, unless I encode these as character references > in the input file to the parser. Is this correct? Yes. > > On a related note, assuming simple ascii files, if I now > encode the carriage return as a character reference, and > round trip the file through an XSLT identity transform, > will the output file be identical or will the carriage > return now be represented as a single <CR> byte? > A very good question, and I don't think the spec gives a very clear answer. Probably when serializing a text or attribute node containing a CR character, the serializer should output "
", because that is the only way of meeting the requirement that the sequence parse(serialize(X)) should give a tree identical to X. But Saxon today doesn't do that; it outputs a CR directly, which will turn into NL on re-parsing. Michael Kay Software AG home: Michael.H.Kay@n... work: Michael.Kay@s...
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|