[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Line ending normalization

  • From: "G. Ken Holman" <gkholman@C...>
  • To: xml-dev@l...
  • Date: Mon, 04 May 2009 15:13:33 -0400

Re:  Line ending normalization
At 2009-05-04 12:14 -0400, Bob Kline wrote:
>I'm having a hard time finding the language in the 1.0 spec [1] 
>which would make it clear whether the line ending normalization 
>which XML processors must perform (more precisely, "must behave as 
>if it normalized all line breaks ...") happens before or after the 
>replacement of character entities.

A line end sequence is comprised only of naked characters, not 
composed parsed numeric character references.

>In other words, for the following document:
>
><a>x&#x000d;&#x000a;y</a>
>
>is the value returned by the XML parser for the text content of 
>element e "x\r\ny" or "x\ny"?

"x\r\ny" because that is what is in the element ... there are no line 
end sequences in the element.

>Could someone point to the language which would address this timing 
>question?

Here:

   http://www.w3.org/TR/2008/REC-xml-20081126/#sec-line-ends


   XML parsed entities are often stored in computer files which,
   for editing convenience, are organized into lines. These lines
   are typically separated by some combination of the characters
   CARRIAGE RETURN (#xD) and LINE FEED (#xA).

   To simplify the tasks of applications, the XML processor MUST
   behave as if it normalized all line breaks in external parsed
   entities (including the document entity) on input, before
   parsing, by translating both the two-character sequence #xD #xA
   and any #xD that is not followed by #xA to a single #xA character.

Note that the "#xA" and "#xD" bits of text are *not* parsed numeric 
character references, they are only prose character references.  It 
is an unambiguous way of referring to the characters, but it is the 
naked characters that are being referred to.

Note the bit "before parsing" ... so the naked characters get 
replaced by a naked #xA and *then* the parsed numeric character 
references of your example would be parsed as content.

>And do the major XML parser implementations handle this issue consistently?

I haven't tripped over a problem with this with various 
implementations ... have you recognized inconsistent 
behaviour?  Certainly the specification seems unambiguous.

I hope this helps.

. . . . . . . . . . Ken

--
XQuery/XSLT/XSL-FO hands-on training - Los Angeles, USA 2009-06-08
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/x/
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson:    http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview:  http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman                 mailto:gkholman@C...
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.