[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Canonical Encoding for XML Elements

  • From: Chris Smith <smith@i...>
  • To: xml-dev@i...
  • Date: Thu, 8 Jan 1998 21:44:16 -0500 (EST)

for xml elements

Here, as mentioned, is our process for creating a canonical form of
XML elements. Comments are welcome.

In particular, do parsers keep CDATA sections distinct from character
data?

-------------------------------------------

Canonical Encoding Format for XML

The canonical format of an XML element is created by firstly
deriving the logical content and structure of the underlying
XML document by parsing it, and then generating the canonical
physical form of the element based on the logical structure
using the process defined below.

For the XML element being generated or any of its child
elements:

*  convert all characters in the element to [UTF16] format1.

*  apply all external entities and all character and entity
   references in the element so that they are completely resolved

*  exclude comments and processing instructions (PIs),

*  reduce all attributes to their canonical form using the
   attribute type in the DTD. Replace all single and double
   quotes present in attributes with &#39; and &#34; respectively
   so that attributes can be enclosed in double quotes

*  create attributes, using their default value, which are not
   present in the original but have default values in the DTD

*  sort the original and generated attributes in ascending
   attribute name order according to the UTF-16 encoding of the
   attribute name (i.e. not the native character ordering)

*  for whitespace inside markup but not inside attribute
   values, generate it as minimally as possible. Specifically:
   -  remove non essential whitespace, and
   -  represent required whitespace by a single space character

*  generate the content of all start tags using only the
   element name and the attributes as described above. If the
   element is an "empty" element then generate it using the
   single empty tag format, with a trailing slash. Generate end
   tags using only the element name, with no added whitespace.

*  remove all whitespace in the element content

*  keep CDATA sections as CDATA sections. Also:
   -  do not convert CDATA sections to character data with
      character references
   -  convert all occurrences of the right angle bracket ">" to
      &#62;

*  character data that is not in CDATA sections must have all
   occurrences of "<", ">", and "&" converted to &#60; &#62; and
   &#38 respectively.

*  start tags, end tags, empty tags, CDATA sections, and text
   sections are assembled in the same order as the original
   document.



---------------------------------------------------------------------------
 Chris Smith                                          <smith@i...>



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.