|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: A heavier-weight proposal for character entitydefinition
Before getting into the details of a schema for an XML syntax for declaring character entities, I think we should step and ask what the real requirements are. What XML did to SGML was preserve SGML's extensibility where it was really needed (for elements and attributes) but remove it where people could get by without it (eg delimiter syntax). Which category do character entity names for in? It is not obvious to me that there is a requirement that character entities be user extensible to the same extent that elements and attributes are. Consder the following points: - in SGML days most people used the standard entity sets - at any point in time the set of things that are being referenced by character entities is closed (i.e. the set of Unicode characters) modulo private use characters (which are typically deprecated on the Web), although it may evolve over time; this is quite different from the situation with elements and attributes - Unicode provides a standard set of names for all Unicode characters - I don't see the compelling user requirement for different users to be able to user different names for the same character - having the 5 builtin entities in XML has worked out pretty well; in particular, there is no need to clutter the infoset or DOM with them; they are just generated as needed on output - if you have user-defined character entity names, then users will start demanding the ability to preserve those names, which means that the DOM/SAX/Infoset will need to record which entity name if any was used for a character So I'm wondering whether a more constrained approach to character entities would work. Suppose for example there is a standard W3C-defined builtin entity set; this would have a version number and would add new characters from time to time (but never change existing entity names). There would be a standard mapping from a version number to a URI where a XML specification of the entity set would be available. However, parsers wouldn't have to fetch and parse this, they could just recognize the version number and refer to an appropriate compiled-in table. The XML declaration would declare the version number of the builtin entity set that was being used; if the XML declaration didn't specify a version number, only the 5 XML 1.0 builtin entities could be used. Just as now, the SAX/DOM/infoset wouldn't record whether a particular character was entered literally or using a builtin entity reference. Instead programs that serialize XML (like XSLT) would have options saying when to use builtin entity references to represent characters. For the first version of the standard builtin entity set we could start with - HTML entities - MathML entities - maybe a set of entity names algorithmically generated from the standard Unicode names in Unicode 3.2; 0xe01; which has a Unicode name of "THAI CHARACTER KO KAI" might be entered as &thai_character_ko_kai;. James
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








