[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: First draft of proposed XML TC for Unicode 3.0 (unofficial)
<escape_clause> If an overriding design goal of XML 1.0 is to ensure that all existing well-formed documents will always be well-formed, forever and ever, then the rest of this message is moot, and should be promptly sent to the trash-bin. If, OTOH, it might be acceptable to break a miniscule number of documents in return for a more dynamic and extensible handling of characters in XML, please consider this message. </escape_clause> It is a given that changes from Unicode 2.0 to 3.0 will require changes to XML 1.0, and thus all existing XML-compliant parsers will cease to be compliant when the changes are made. These Unicode changes aren't "corrections of printers errors" -- they are real changes in the XML spec, and will require changes to XML parsers and apps, as well. I guess my previous message was sufficiently obtuse, since my real intention was to raise the issue of how these sorts of changes are to be managed. I previously wrote: >> >> This change of classification may well break some existing XML parsers >> and/or apps, no matter whether or not these characters remain legal in XML >> names. > >John Cowan replied: > > Only if those parsers are not compliant. Appendix B explicitly lays out in the > BNF what is and what is not legal in XML names, which is precisely > why it needs revision now. I should have said that "..the new BaseChars changes _will_ break _all_ existing XML parsers and/or apps..". Once this change is made to XML, existing parsers won't be compliant since they've implemented BNF rule 85 from REC-xml-19980210, and thus won't recognize these new BaseChars (e.g. #x01F6) as legal name characters. XML 1.0 is frozen in the time of Unicode 2.0 since XML used a copy of, rather than a reference to, the Unicode character encodings. What i was suggesting (rather poorly, it seems) is that if XML were to simply refer to Unicode there would never be the need to squeeze changes to XML into "corrigenda". I do understand that using BNF to describe XML required this sort of copying, since rule 85 is the foundation of many other rules. But i seriously doubt that most parsers actually use the BNF directly to build their internal tables -- the BNF is merely the specification of data that are translated into internal bitmaps or whatever. I previously wrote: >> >> IMHO, "backward compatibility" does not justify a special rule for the >> treatment of these characters! If symbols, in general, are not legal name >> characters, then these symbols should not receive special treatment, just >> because there were erroneously classified in an earlier Unicode. > >John Cowan replied: > > Adding an extra rule isn't that hard, I<em>M</em>HO. Very true, but isn't this the top of a slippery slope, whereby every change to Unicode might require yet another special rule to maintain backward compatibility? It would be possible to define XML characters as being based directly upon the current Unicode data tables, i.e., replace the whole BNF rule 85 table with a rule that directly referenced Unicode: "BaseChar ::= [..what Unicode says..]". I realise that this example isn't real BNF, but it is just as valid a method of specifying characters. We could perhaps refer to Unicode's BNF rules for the purpose of the XML grammar, but use Unicode tables for actual XML implementations. This way, XML would simply need to use "..a set of rules whereby you can extract the XML lists from those in the [Unicode] standard automatically." It's true that some characters might re-classified and thus cause some documents that were well-formed to lose that status (again, i believe this to be a miniscule subset of all XML documents). But the advantage would be a simple and open identification of documents as "XML 1.0 + Unicode x.x compliant". If i'm using XML in a real-world environment, it doesn't matter if XML 1.0 has been changed to allow some new character if i haven't upgraded my Unicode support, and vice versa. This would tighten the bond between XML and Unicode, since the latter organization couldn't make their changes oblivious to their impact upon XML (no insult intended to Unicode, Inc.). Since XML is based upon Unicode, XML developers are also, by definition, Unicode developers -- these two communities are already interdependent. As mentioned in the annotated version of XML 1.0, there exists are a contradiction between the abstract and section 1.1 of XML 1.0 regarding the completeness of the XML spec. A specification's text typically takes precedence over its abstract. Given this, we could presume that XML is intended to be based upon Unicode and ISO 10646, and we could/should defer to those standards for classifications of characters, assignments of values, etc. I'm just speculating about a future implementation of these interlocking standards that would be extensible by relying upon commonly shared data tables, rather than specified grammars -- a little more OO, a little less BNF/ML. Regards, Nik O, Teton Data Systems, Jackson, Wyo. ======= Begin excerpts (from XML 1.0 Rec) ======= Abstract The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. : 1.1 Origin and Goals : This specification, together with associated standards (Unicode and ISO/IEC 10646 for characters, Internet RFC 1766 for language identification tags, ISO 639 for language name codes, and ISO 3166 for country name codes), provides all the information necessary to understand XML Version 1.0 and construct computer programs to process it." ======= End excerpt ======= ======= Begin excerpts (from Tim Bray's Annotated XML 1.0) ======= XML Rules For Character Classification Although the Working Group emphatically did argue over the inclusion and exclusion of individual characters, we (well, mostly James Clark) were able to work out a set of rules whereby you can extract the XML lists from those in the standard automatically. ======= End excerpt ======= xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|