[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: UTF-8+names
David Carlisle wrote: > > > > As I understand, in UTF-8+name, an ampersand is represented > as &&; > > which means that, if UTF-8+name is used for XML, "normal" entity > > references will look like: > > > > &&;myentity; > > Not necessarily, &myentity; would also work so long as it > wasn't one of the predefined names. If the entity isn't > "known" then it expands to itself in the character encoding, > leaving the entity to be expanded by the XML parser in the usual way. I agree, but please see what I wrote in my previous email about a program that is to produce a UTF-8+names encoding from a string of Unicode characters. What would you think should be the recommended behavior of such a program wrt. how to encode AMPERSAND characters? > > > and numeric character references will look like: > > > > &&;#12345; > > similarly only one & is needed here as well. > > > < > > > > but this can be confusing because it would denote a **literal** < > > character, > > No it's defined to have the definition in xhtml and mathml > which is the definition given in the xml spec, double > escaped, so it would expand to a character reference to a < > character, not a literal <. Yes, I noticed that I had missed this. Anyway, what you say above may mean one of two different things: 1) < is defined as a replacement name in UTF-8+names, which implies that the bytes will be decoded into the characters & # 6 0 ; (following XML 1.0) and the XML processor will substitute the character < on parsing those characters 2) < is *not* defined as a replacement name in UTF-8+names, which implies that the bytes will be decoded one by one into the characters & l t ; and the XML processor will "include" the predefined entity lt and eventually substitute the character < Although the effect of (1) and (2) will be the same when parsing an XML document, it will not be the same when decoding a sequence of bytes in a non-XML context. I am not sure the document is clear on this. At any rate, I don?t think it would be a good idea to decode & l t ; into the characters & # 6 0 ; because this sequence of characters is meaningless outside of XML. So < should really not be a defined replacement name in UTF-8+names. I have a question about all the other entities defined in XHTML and MathML. Do all of them resolve to actual characters, or do some of them resolve to escaped references (like < does)? If some entities resolve to escaped character references, they need an XML context to work correctly, and therefore should not be included among the defined replacements in UTF-8+names (because a Unicode encoding should not rely on XML to work correctly). Alessandro > > > It is not very clear to me where UTF-8+name would be useful, as I > > don't think it is useful in XML. Is it being proposed for use in > > areas where, for some reason, XML cannot be used? > > No its whole point is to allow the use of → or > é _with_ XML but _without_ a DTD to allow for relax or > xsd schema use, or just simply well formed fragments with no > schema at all. > > > some other people have suggested not using & as the delimiter > but again that would break the main use case of this, the > FFFFAQ question on xsl-list asking why "& n b s p ;" > generates an error in xsl. > > David >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|