[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Attribute value normalization

  • From: Richard Tobin <richard@c...>
  • To: MURATA Makoto <murata@a...>, xml-dev@i...
  • Date: Wed, 27 May 1998 12:35:14 +0100 (BST)

> While translating the XML specification, I find that I do not understand 
> the attribute normalization mechanism of XML.

The result produced by RXP and LT-XML is given at the end (except that
carriage return characters have been replaced by the sequence ^M for
ease of reading).  Here is my explanation for each case.  The relevant
section of the standard is of course 3.3.3.

> <test a="
> test
> test
> "/>

In this case, the linefeeds (or whatever record boundaries are in your
system) are replaced by spaces. Then, the trailing spaces are removed and
the other spaces compressed.  So the result is

  <test a="test test"/>

This is of course the intended way for NMTOKENS to work.

> <test a="&D;&A;&D;&A;test&D;&A;&D;&A;test&D;&A;&D;&A;"/>
> <test a="&DA;&DA;test&DA;&DA;test&DA;&DA;"/>

In this cases the character entities were expanded (into carriage
returns and linefeeds) when then general entities were defined.  So
when the replacement text of the entities is "recursively processed",
they get turned into spaces.  They then get stripped or replaced,
producing the same result as the first case.

[However, if the attribute were of type CDATA, the result would be
different from the first case: these would have 4 spaces instead of 2,
because the cr/lf pairs in the first case were reduced to linefeeds
(probably on input, see section 2.11), whereas in the second case they
are not part of the *literal* entity value of the internal entity.]

> <test a="&#xD;&#xA;&#xD;&#xA;test&#xD;&#xA;&#xD;&#xA;test&#xD;&#xA;&#xD;&#xA;"/>
> <test a="&#xD;&#xD;test&#xD;&#xD;test&#xD;&#xD;"/>
> <test a="&#xA;&#xA;test&#xA;&#xA;test&#xA;&#xA;"/>

In these cases, the character references are appended, but unlike the
case general entity references the result is not recursively
processed.  So there are no space characters to normalise, and the
result is the same as if the attribute had had type CDATA - that is,
the carriage returns and linefeeds appear in the normalised value.

Here is the RXP/LT-XML output:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test [
<!ELEMENT test (#PCDATA|test)*>
<!ATTLIST test 
<!ENTITY D "&#xD;"> 
<!ENTITY A "&#xA;">
<!ENTITY DA "&#xD;&#xA;">  ]>
<test a="test test"/>
<test a="test test"/>
<test a="test test"/>
<test a="^M
<test a="^M^Mtest^M^Mtest^M^M"/>
<test a="




-- Richard

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.