[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Request for Erratum to XML 1.0 and 1.1 Specs


RE:  Request for Erratum to XML 1.0 and 1.1 Specs
This is not an erratum. It is a change proposal.

Michael Kay

> -----Original Message-----
> From: Rick Jelliffe [mailto:ricko@a...] 
> Sent: 21 October 2003 10:11
> To: xml-dev@l...
> Subject:  Request for Erratum to XML 1.0 and 1.1 Specs 
> 
> 
> I have just sent this off to the XML Editor mail list. I encourage 
> anyone who thinks it
> is good or bad (or who just thinks there should be something 
> but doesn't 
> care what)
> to also send to them.
> 
> It also raises an interesting question: the XML spec is written in 
> draconian terms with,
> nominally, very few options. Yet SAX 2, the almost 
> universally deployed 
> parser
> interface, is highly parameterizable with features, handlers and 
> properties. So it
> cannot be too tragic to accept that some systems may need to bend 
> certain rules,
> without altering the basic definitions.
> 
> Rick
> 
> ===============================================================
> 
> Request for Erratum to XML 1.0 and 1.1 Specs
> ----------------------------------------------
> Rick Jelliffe, ricko@t..., 2003-10-21
> 
> 
> I request the XML Working Group please consider the following 
> erratum to XML 1.0 which should also apply to XML 1.1.
> 
> The following two paragraphs, or something to the same 
> effect, should be 
> appended to section 5.1 "Validating and Non-Validating Processors"
> 
> 
> 
> "A non-validating processor may, at user option, imply 
> definitions for all the character entities defined by HTML 
> 4[1]. A document or entity 
> for which definitions are implied is not well-formed. The 
> processor must 
> report a non-fatal error. NOTE: The document is 'not well-formed but 
> processed'. Reliance on this feature by specifications is deprecated; 
> this option may be withdrawn at some
> future time should it prove dangerous."
> 
> "A non-validating processor which provides the HTML 4 
> definitions may, at user option, also imply definitions for 
> other Math ML and ISO standard sets[2]. A processor must 
> report a non-fatal error. The document is 'not well-formed 
> but processed'. NOTE: Reliance 
> on this feature by specifications is deprecated; this option may be 
> withdrawn at some future time should it prove dangerous."
> 
> [1] http://www.w3.org/TR/html401/sgml/entities.html
> [2] http://www.w3.org/TR/MathML2/chapter6.html#chars_entity-tables
> 
> 
> 
> This suggested erratum has the following characteristics:
> 
> 1) It does not require any change to any XML processor
> 2) It does not change the basic XML characteristic that the 
> only way to guarantee information is received at the other 
> end is to use a UTF-* encoding, no entities and no attribute 
> defaulting.
> 3) It maintains the current layering, ao no re-architecting
> or change in design is needed
> 4) It keeps the XML specification as the location on how to
> go from characters to data+markup.
> 
> 5) It does not make any existing valid XML document invalid
> 6) It does not make any existing invalid XML document valid
> 7) It does not make any existing WF document or entity non-WF
> 8) It does not make any existing non-WF document formally WF
> 
> 9) It does allow the continued non-validating processing of 
> documents which are non-WF only because they contain standard 
> references
> 10) It limits this to user option
> 11) It does not allow other specifications to use this as
> its default
> 12) It can be withdrawn
> 
> 13) I believe it is practical and would be simple to implement.
> 
> 
> 
> I believe the beneficiaries of such an erratum include:
> 
>  * Users typing in editors with no adequate input methods
>  for non-ASCII characters. I note that although Unicode
>  editors can display many characters, not all operating
>  systems have input methods to allow convenient data entry
>  even of Latin1 characters. (I believe this is better 
> provided  by using decent XML markup editors, without prejudice.)
> 
>  * XHTML users who are used to named references without 
> declarations  in HTML.
> 
>  * Potential XInclude users, who may wish
>  to treat a WF parsed entity from a document that uses
>  standard character references as a microdocument
> 
>  * Potential XML Schemas, Schematron and RELAX NG users who
>  may wish to upgrade from DTDs.
> 
>  * Potential XQuery users who are being hindered by the lack
>  of XML Schemas.
> 
>  * XML pipeline systems which can pass XML without requiring
>   tricky prologs
> 
>  * SOAP, RSS and RDF systems which must cope with data 
> fragments  from externally-generated document being embedded
> 
>  * Programmers serializing data to XML, especially for internal
>   systems, who may prefer to generate "—" or " "
>   rather than the numeric or literal equivalents.
> 
>  * Vendors who make products for the above
> 
>  * Low-sight or motion-impaired users whose speech synthesizers
>   or input methods only support ASCII characters. Aged, enraged
>   or diminished capacity users who may be frustrated at having
>   to lookup the number for something they know the name for.
>   (Though I do not want to suggest that "entity rage" is a hidden
>   problem.)
> 
> 
> I suggest its benefits over other suggested approaches include:
> 
>  * It does not require change to subsequent processes, as PSVI
>   processing would, nor any changes or additions to schema
>   specifications
> 
>  * It does not require pre-processing, as a macro processor would
> 
>  * It does not require the introdution and deployment of new
>   transcoders, as would Tim Bray and John Cowan's recent thought
>   experiment "UTF-8+Names"
> 
>  * It does not require interaction with other standards 
> groups, notably
>   XML Schemas EG or IANA or IETF.
> 
>  * By providing it at user option, it can succeed or fail; if 
> it is  popular and successful, that is good; if it is 
> unpopular or unsafe.
> 
>  * By limiting itself to the HTML and the MathML/ISO entities, it
>   avoids issues of user-defined entities, and the need to enumerate
>   the entities.
> 
>  * It does not define mappings for those characters, but defers to
>   HTML and MathML/ISO, who may provide standard mappings.
> 
> This gives a very wide constituency:
> 
> I note that Xerces' SAX 2 provide features by which a parser 
> can continue processing after an error. This proposal could 
> be seen as a very limit nod of recognition of that kind of practise.
> 
> 
> Cheers
> Rick Jelliffe
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org 
> <http://www.xml.org>, an initiative of OASIS 
<http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.