[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: What to escape when serializing XML

  • From: "Pete Cordell" <petexmldev@t...>
  • To: "Frans Englich" <frans.englich@t...>,<xml-dev@l...>
  • Date: Tue, 2 Jan 2007 15:11:56 -0000

c xml escape
Purely from an XML / object serialization point of view your main concern is 
to escape those characters corresponding to the built-in parameter entities 
&amp;, &lt;, &gt;, &quot; and &apos;.  Even these are not required in all 
contexts (e.g. within attribute values delimited by " characters, you need 
not escape ' characters etc.

In terms of end-of-line encoding, the approach seems to be to output what is 
convenient (CR, LF, or CRLF) and have the receiving application sort out the 
situation.  Conceptually, the receiving XML processor should normalize the 
end-of-line markers to 0x0A and then the application converts that to which 
ever of CR, LF, or CRLF is appropriate.  However, it's quite possible to do 
this in one step.  See section 2.11 of the XML spec for more.

HTH,

Pete.
--
=============================================
Pete Cordell
Tech-Know-Ware Ltd
for XML to C++ data binding visit
http://www.tech-know-ware.com/lmx
(or http://www.xml2cpp.com)
=============================================

----- Original Message ----- 
From: "Frans Englich" <frans.englich@t...>
To: <xml-dev@l...>
Sent: Tuesday, January 02, 2007 1:13 PM
Subject:  What to escape when serializing XML


>
> Hi all,
>
> I'm doing some bug fixing in a piece of code that does XML 
> serialization(sort
> of), and could use some help in determining what characters that needs to 
> be
> escaped with character references. It's all in the realm of XML 1.0.
>
> The code in question is not intended to conform to XSLT 2.0 and XQuery 1.0
> Serialization, but that spec is neverthless informative. For example, 
> section
> 5 XML Output Method, reads:
>
> <quote>
> A consequence of this rule is that certain characters MUST be output as
> character references, to ensure that they survive the round trip through
> serialization and parsing. Specifically, CR, NEL and LINE SEPARATOR
> characters in text nodes MUST be output respectively as "&#xD;", "&#x85;",
> and "&#x2028;", or their equivalents; while CR, NL, TAB, NEL and LINE
> SEPARATOR characters in attribute nodes MUST be output respectively as
> "&#xD;", "&#xA;", "&#x9;", "&#x85;", and "&#x2028;", or their equivalents. 
> In
> addition, the non-whitespace control characters #x1 through #x1F and #x7F
> through #x9F in text nodes and attribute nodes MUST be output as character
> references.
>
> XML 1.0 did not permit an XML processor to normalize NEL or LINE SEPARATOR
> characters to a LINE FEED character. However, if a document entity that
> specifies version 1.1 invokes an external general parsed entity with no 
> text
> declaration or a text declaration that specifies version 1.0, the external
> parsed entity is processed according to the rules of XML 1.1. For this
> reason, NEL and LINE SEPARATOR characters in text and attribute nodes must
> always be escaped using character references, regardless of the value of 
> the
> version parameter.
>
> XML 1.0 permitted control characters in the range #x7F through #x9F to 
> appear
> as literal characters in an XML document, but XML 1.1 requires such
> characters, other than NEL, to be escaped as character references. An
> external general parsed entity with no text declaration or a text 
> declaration
> that specifies a version pseudo-attribute with value 1.0 that is invoked 
> by
> an XML 1.1 document entity must follow the rules of XML 1.1. Therefore, 
> the
> non-whitespace control characters in the ranges #x1 through #x1F and #x7F
> through #x9F must always be escaped, regardless of the value of the 
> version
> parameter.
> </quote>
>
> These paragraphs gives good hints to the complexity in this, but it's not 
> very
> exact("Specifically, CR, NEL ...").
>
> Does anyone know or know how to determine exactly what characters that 
> needs
> to be escaped? I could set my brain to work and read the XML spec from 
> start
> to finish, but I could easily get something wrong.
>
>
> Cheers,
>
> Frans
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@l...
> subscribe: xml-dev-subscribe@l...
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
> 




[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.