RE: UTF-8+names

To: "'David Carlisle'" <davidc@n...>
Subject: RE: UTF-8+names
From: "Alessandro Triglia" <sandro@m...>
Date: Sun, 19 Oct 2003 03:21:00 -0400
Cc: <xml-dev@l...>
Importance: Normal
In-reply-to: <200310182112.WAA26397@e3000>

Play the video

David Carlisle wrote: 
> 
> 
> > As I understand, in UTF-8+name, an ampersand is represented 
> as  &&;  
> > which means that, if UTF-8+name is used for XML, "normal" entity 
> > references will look like:
> > 
> > 	&&;myentity;
> 
> Not necessarily, &myentity; would also work so long as it 
> wasn't one of the predefined names. If the entity isn't 
> "known" then it expands to itself in the character encoding, 
> leaving the entity to be expanded by the XML parser in the usual way.

I agree, but please see what I wrote in my previous email about a program
that is to produce a UTF-8+names encoding from a string of Unicode
characters.  What would you think should be the recommended behavior of such
a program wrt. how to encode AMPERSAND characters?

> 
> > and numeric character references will look like:
> > 
> > 	&&;#12345;
> 
> similarly only one & is needed here as well.
> 
> > 	&lt;
> > 
> > but this can be confusing because it would denote a **literal** < 
> > character,
> 
> No it's defined to have the definition in xhtml and mathml 
> which is the definition given in the xml spec, double 
> escaped, so it would expand to a character reference to a < 
> character, not a literal <.

Yes, I noticed that I had missed this.  Anyway, what you say above may mean
one of two different things:

1) &lt; is defined as a replacement name in UTF-8+names, which implies that
the bytes will be decoded into the characters  & # 6 0 ;  (following XML
1.0) and the XML processor will substitute the character  <  on parsing
those characters

2) &lt; is *not* defined as a replacement name in UTF-8+names, which implies
that the bytes will be decoded one by one into the characters  & l t ;   and
the XML processor will "include" the predefined entity lt and eventually
substitute the character  <

Although the effect of (1) and (2) will be the same when parsing an XML
document, it will not be the same when decoding a sequence of bytes in a
non-XML context.  I am not sure the document is clear on this.  At any rate,
I don?t think it would be a good idea to decode   & l t ;  into the
characters   & # 6 0 ;   because this sequence of characters is meaningless
outside of XML.  So  &lt;  should really not be a defined replacement name
in UTF-8+names.

I have a question about all the other entities defined in XHTML and MathML.
Do all of them resolve to actual characters, or do some of them resolve to
escaped references (like &lt; does)?  If some entities resolve to escaped
character references, they need an XML context to work correctly, and
therefore should not be included among the defined replacements in
UTF-8+names (because a Unicode encoding should not rely on XML to work
correctly).

Alessandro

> 
> > It is not very clear to me where UTF-8+name would be useful, as I 
> > don't think it is useful in XML.  Is it being proposed for use in 
> > areas where, for some reason, XML cannot be used?
> 
> No its whole point is to allow the use of &rightarrow; or 
> &eacute; _with_ XML but _without_ a DTD to allow for relax or 
> xsd schema use, or just simply well formed fragments with no 
> schema at all.
> 
> 
> some other people have suggested not using & as the delimiter 
> but again that would break the main use case of this, the 
> FFFFAQ question on xsl-list asking why "& n b s p ;" 
> generates an error in xsl.
> 
> David
>

Follow-Ups:
- Re: UTF-8+names
  - From: John Cowan <cowan@m...>

Prev by Date: RE: UTF-8+names
Next by Date: Re: UTF-8+names
Previous by thread: Re: UTF-8+names
Next by thread: Re: UTF-8+names
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >