|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Using entities for me dash problem
Hi guys, Okay, actually, the root of my misunderstanding comes from valid XML--is it okay then, within a UTF-8 encoded XML document, to just type the em dash? What I mean is, I thought that I had to use the NCR for valid XML. But is it dependant on the encoding? That UTF-8 can render the em dash directly--of course, this is the easy answer, and now I understand the encoding part of XSL. Because the em-dash can be rendered *as is* in UTF-8, if I specify that encoding, that's what I'll get. And anything that knows how to render UTF-8 will do the same. So I don't have to put the NCR data in. That, I now understand. But is it still valid XML? /johnny :) On 9/12/03 7:07 AM, "Richard Tobin" <richard@c...> wrote: > In article <BBFB12B2.10A0%subscriber@p...>, > JCS <subscriber@p...> wrote: > >> Thanks, I'm aware of this, however, what I *don't* understand because it >> makes no logical sense, is that if I'm transforming XML UTF-8 to UTF-8 my >> declaration should be UTF-8, not something else. What I mean is, if I use >> US-ASCII as an output method to preserve my NCR the declaration in the >> output file should still be UTF-8, not ASCII, because it's NOT ascii but it >> was *translated* using ASCII to preserve the NCR. Does this make sense? What >> is it that *I'm* not getting, because it's very confusing. > > If an XML file contains — this means Unicode character 8212, > which as you know is em-dash. The encoding of the file is irrelevant > to this: the numbers in character references are always unicode code > points regardless of the file encoding. The encoding of the file > determines how the characters "&", "#", "8", "2", "1", "2", and ";" > are represented, not what they mean. > > Once the XML document is parsed, there will just be the em-dash > character, stored in the program's own internal encoding. The fact > that it was once represented by a numeric character reference is > forgotten. > > When the program comes to output the document, it will have to decide > how to represent it. If the output encoding can represent the > character (as UTF-8 can), it will probably just output it directly. > If it can't (as ASCII can't) it will have to use a character > reference. So you can force character references for non-ASCII > characters by specifying ASCII as the output encoding. > > Remember that this is just a trick. If you really want to output the > document as UTF-8, but with non-ASCII characters represented by > character references, then you probably have to write your own code to > do it. But since ASCII is a subset of UTF-8, any ASCII XML document > can be converted to a UTF-8 version just by manually editing the > encoding declaration. > > -- Richard > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://lists.xml.org/ob/adm.pl> > > > -- "Religion is for people who are afraid they'll go to hell. Spirituality is for people who have been there."
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








