[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: How to handle "newline" characters in an XML parser.
On Tue, Dec 05, 2006 at 11:24:55AM -0800, Redefined Horizons wrote: > I'm nearing the completion of an open source XML parser in Java. (It's > an event-based, pull parser.) why? do we need more parsers? :-) [...] > I'm having some trouble figuring out how to handle "newline" > characters in XML text files on different platforms. I typically > ignore all whitespace in the parser, but I wanted to count newline > characters to aid in errror reporting. You can't ignore whitespace, you have to return it to the application, except when it's explicitly ignorable because a DTD says so, or when it's e.g. inside a tag matching the S production. > I've taken a look at the XML specs, but didn't completely understand > what they had to say about newline characters. Can you ask a more specific question? Are you asking when normalization happens? By newline do you mean the character at Unicode code point 10? Remember that the spaces inside the desc element in: <desc>his socks were <em>very</em> <pattern>argyle</pattern>.</desc> are all important, including the one between </em> and <pattern>. For error reporting, line counting depends on the platform, and should probably correspond to using a native text editor on that platform -- as that's what users will have to use when they get an error. Liam -- Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/ http://www.holoweb.net/~liam/ * http://www.fromoldbooks.org/
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|