[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Character Entities: An XML Core WG View
From: "Jelks Cabaniss" <jelks@j...> > It does, but &#xnnn;'s scattered throughout a document are hard to > proof. That's the only reason people want names (and not as > elements!:). I think it is more than that. Very few fonts have all the characters of Unicode. Probably no font has all the characters of Unicode 3.2! Many systems don't have synthetic fonts to allow fallback, and don't check that a glyph is actually present. Having an entity mechanism allows system-local mappings that select a font as well a character. We need to be very careful when talking about Unicode that we don't expect that is solves any problems w.r.t. making display systems have all glyphs available. People whose tasks is to move data from A to B can treat XML's Unicode support as relieving them of lots of difficult problems. But people whose responsibility it is to make sure that all the characters that they send appear in a final rendered form have to get down and dirty with partial fonts. Furthermore, there is not agreement over the best characters to use for each entity. Indeed, in a few cases there is positive disagreement and certain entities changed their characteristic glyph between the 8879 sets, the HTML sets and the newer ISO sets. This all springs from SGML's emphasis: which was not guaranteed interoperability but on rigourous description--adequate details of the conventions used to allow a recipient (person) to know what they would need to map on their own system. > > Once again, sigh. I haven't seen a better idea, but one would be > > welcome. The only approach that I have seen that makes sense is to build in a fixed standard set of characters into XML, with known mappings. Then, for some open-source mapping libaries to be made, so that developers can trivially add the mapping to their weeny parsers. Or, so that we can build-in certain entities in parsers. Or that vendors of typesetting systems can ensure that the characters that are standardly mapped to are supported in all fonts (by fallback if needed.) Now to do this requires an agreement on what the best mappings for entities to Unicode strings are. I have been involved in a project to do just this, for the last few months, with the intent of taking it to ISO: the task mainly involves cross-checking DOCBOOKs mappings with W3C MathML's mappings, and then going through issues from other sources. XML-DEV-ers may be interested in the status of this. There were various mappings of the ISO entities from different sources before XML. Notable among those were those of the Maler and el Andoloussi book, from vendors, from HTML, and from TEI. The Unicode Consortium had a checklist of the SGML entities too. When XML came out, I made a mapping to Unicode, and John Cowan wrote up the Unicode mapping too, but these were in terms of Unicode 2.0. The TEI sets were revised, as were the HTML sets, including for ISO HTML, I believe. But the two main modern mappings efforts have been the DOCBOOK mappings at OASIS, associated largely with Norm Walsh and the MathML mappings at W3C, associated largely with David Carlisle who has been particularly helpful. I have been going through these, and the other mappings, to see how much agreement there is, and what ways forward there might be. Anyway, my point is that ditching the entity declaration method is a separate question to ditching standard entity references. A future version of XML could keep named character references with defined mappings, while ditching user-defineable entities. Getting a standard mapping, or pointing out which entities have different usages in different communities, seems to me the first step that would be needed in any direction. (As to whether ditching entities is desirable, lets not fool ourselves that the pros and cons of standard entities for characters, for internal entitites, for exteral parsed entitites and for external unparsed entities are all the same. They would have to be replaced by four different technologies: making XML much more complicated. Consider that parameter entities in WXS needed to be replaced/reconstituted by about 6 different mechanism: redefine, import, include, substitution groups, the tag/type distinction, attribute groups, without even attempting INCLUDE parameters or variable schemas. We shouldn't expect to get rid of a generic mechanism without being saddled with a handful of technologies to take its place: mind you, we are almost there with XLink/XBase/XInclude "reconstructing" many entity functions, though unusable outside specific processing models.) Cheers Rick Jelliffe
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|