[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Line ending normalization

  • From: Bob Kline <bkline@r...>
  • To: xml-dev@l...
  • Date: Mon, 04 May 2009 19:05:33 -0400

Re:  Line ending normalization
G. Ken Holman wrote:
> At 2009-05-04 12:14 -0400, Bob Kline wrote:
>
>> Could someone point to the language which would address this timing 
>> question?
>
> Here:
>
>   http://www.w3.org/TR/2008/REC-xml-20081126/#sec-line-ends ....
>
> Note that the "#xA" and "#xD" bits of text are *not* parsed numeric 
> character references, they are only prose character references.  It is 
> an unambiguous way of referring to the characters, but it is the naked 
> characters that are being referred to.

Understood.

>
> Note the bit "before parsing" ... so the naked characters get replaced 
> by a naked #xA and *then* the parsed numeric character references of 
> your example would be parsed as content.

Right.  The spec used the term "include" to describe the step at which 
the character references are replaced with the corresponding characters.

    [An entity is *included* when its replacement text
    <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-repltext> is
    retrieved and processed, in place of the reference itself, as though
    it were part of the document at the location the reference was
    recognized.] The replacement text may contain both character data
    <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-chardata> and
    (except for parameter entities) markup
    <http://www.w3.org/TR/2008/REC-xml-20081126/#dt-markup>, which
    /MUST/ be recognized in the usual way. (The string " |AT&amp;T;| "
    expands to " |AT&T;| " and the remaining ampersand is not recognized
    as an entity-reference delimiter.) A character reference is
    *included* when the indicated character is processed in place of the
    reference itself.

Not seeing any indication in that passage that this step too place /as 
part of parsing/, I wasn't 100% confident that this wasn't part of some 
pre-parsing step (for example, a regular expression pattern recognition 
swap, before the stage at which the structure of the document was 
extracted).


>
>> And do the major XML parser implementations handle this issue 
>> consistently?
>
> I haven't tripped over a problem with this with various 
> implementations ... have you recognized inconsistent behaviour?

Not exactly, but I did notice that parts of Microsoft's .NET libraries 
for XML processing (the newer XLINQ classes) implement the line 
termination normalization correctly, whereas other parts of those 
libraries (the DOM classes) don't, so I got a little nervous about 
whether we'd run into parsers which introduced other inconsistencies.  A 
team implementing software with which one of our services interacts was 
proposing replacement of "\r\n" sequences in the input with 
"&#x000d;&#x000a" in order to sidestep the normalization described in 
the spec, and I wanted to make sure that wasn't too fragile an approach.

> Certainly the specification seems unambiguous.
>
> I hope this helps.
>

Yes, very much.  Thanks!

-- 
Bob Kline
http://www.rksystems.com
mailto:bkline@r...



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.