[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: A heavier-weight proposal for character entitydefinition

  • To: "Henry S. Thompson" <ht@c...>, xml-dev@l...
  • Subject: Re: A heavier-weight proposal for character entitydefinition
  • From: James Clark <jjc@j...>
  • Date: Wed, 06 Feb 2002 12:18:41 +0700

html builtin entities
Before getting into the details of a schema for an XML syntax for declaring 
character entities, I think we should step and ask what the real 
requirements are.

What XML did to SGML was preserve SGML's extensibility where it was really 
needed (for elements and attributes) but remove it where people could get 
by without it (eg delimiter syntax). Which category do character entity 
names for in? It is not obvious to me that there is a requirement that 
character entities be user extensible to the same extent that elements and 
attributes are. Consder the following points:

- in SGML days most people used the standard entity sets

- at any point in time the set of things that are being referenced by 
character entities is closed (i.e. the set of Unicode characters) modulo 
private use characters (which are typically deprecated on the Web), 
although it may evolve over time; this is quite different from the 
situation with elements and attributes

- Unicode provides a standard set of names for all Unicode characters

- I don't see the compelling user requirement for different users to be 
able to user different names for the same character

- having the 5 builtin entities in XML has worked out pretty well; in 
particular, there is no need to clutter the infoset or DOM with them; they 
are just generated as needed on output

- if you have user-defined character entity names, then users will start 
demanding the ability to preserve those names, which means that the 
DOM/SAX/Infoset will need to record which entity name if any was used for a 
character

So I'm wondering whether a more constrained approach to character entities 
would work.  Suppose for example there is a standard W3C-defined builtin 
entity set; this would have a version number and would add new characters 
from time to time (but never change existing entity names).  There would be 
a standard mapping from a version number to a URI where a XML specification 
of the entity set would be available.  However, parsers wouldn't have to 
fetch and parse this, they could just recognize the version number and 
refer to an appropriate compiled-in table.  The XML declaration would 
declare the version number of the builtin entity set that was being used; 
if the XML declaration didn't specify a version number, only the 5 XML 1.0 
builtin entities could be used. Just as now, the SAX/DOM/infoset wouldn't 
record whether a particular character was entered literally or using a 
builtin entity reference. Instead programs that serialize XML (like XSLT) 
would have options saying when to use builtin entity references to 
represent characters.

For the first version of the standard builtin entity set we could start with

- HTML entities
- MathML entities
- maybe a set of entity names algorithmically generated from the standard 
Unicode names in Unicode 3.2; 0xe01; which has a Unicode name of "THAI 
CHARACTER KO KAI" might be entered as &thai_character_ko_kai;.

James


 

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.