[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: CDATA sections in W3C XML Infoset

  • From: John Cowan <jcowan@r...>
  • To: Bob Kline <bkline@r...>, "xml-dev@x..." <xml-dev@x...>
  • Date: Fri, 30 Mar 2001 11:37:10 -0500

embedded cdata
Bob Kline wrote:


> No?  We have quite a bit of code in our XML repository which uses XML
> commands over sockets for its client-server interface to the rest of the
> world.  Most of the commands embed an XML document being stored in or
> retrieved from the repository.  The embedded documents are wrapped in
> CDATA sections.

And when the embedded document already contains a CDATA section?  Bzzzzt,
not well-formed.

>  The logic for extracting a document from an incoming
> client command is essentially:
> 
>    Find the element containing the CDATA section.
>    Find the CDATA child of the element.
>    Hand the value of the CDATA section to the parser.

I admit this is an easy DOM-based hack.  But it shouldn't be
*that* much harder to know what element you are looking for,
pull out a Text child (initially there should be only one,
or you can normalize), and do the conversion below.

> Before you even think about suggesting how easy it would be to restore
> the angle brackets in the embedded document, let me point out that the
> &lt; and &gt; which are not delimiters for the element tags in the
> embedded document cannot be "restored" to < and >, and I submit that it
> is impossible in some cases to distinguish which those were.  Therefore
> information has been lost.

Not so if you encode properly.  By changing every "&" in the embedded
document to "&amp;" and every "<" to "&lt;" (conceptually in that order),
you get this result:

	Original	Embedding
	<		&lt;
	&		&amp;
	&lt;		&amp;lt;
	&amp;		&amp;amp;
	&amp;lt;	&amp;amp;lt

Etc. etc.  No information is lost: change every "&lt;" to "<" and
every "&amp;" to "&" (conceptually in that order) and the exact
original is restored.  In this encoding, ">" characters need not
be changed.

> Before you suggest that the embedded document should not have been
> wrapped in a CDATA section in the first place, let me say that:

[points snipped]

These points basically say that your embedded documents are text,
not necessarily XML.  The safe way to encode text in an XML document
is to use the mapping above.

-- 
There is / one art             || John Cowan <jcowan@r...>
no more / no less              || http://www.reutershealth.com
to do / all things             || http://www.ccil.org/~cowan
with art- / lessness           \\ -- Piet Hein


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.