[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Historical I18n Note
Yes Tim, you have explained your position on this before based your background as a consultant and programmer. http://lists.w3.org/Archives/Public/w3c-sgml-wg/1996Nov/0141.html "I have always explained all the benefits of SGML (ISO, vendor-independent, platform-independent, content not presentation, you know the drill). When I do that, I almost always get the Sounds Good Maybe Later response: "SGML is this great big complicated technology and we're going to have to hire consultants and buy huge expensive pieces of software and it won't work with the Web." I sometimes feel that that the SGML community is unaware how prevalent this mind-set is. I've always argued against this, but have felt to some degree like I'm swimming up-hill. Lately, I have also been explaining that there is an SGML starter-kit called XML, which is small, lightweight (I wave a printout of the draft spec at them), easy to understand, and designed to work on the web. But you still get data safety and constrained-authoring because it's SGML." So, SGML still is where we start when looking for extensibility solutions that keep us in the standards world and protected from privatization. It isn't always the case that programmer sensibility dominates requirements. When management issues are worked out, that is, what is a business decision and what is done for the good of "principles", one evaluates all solutions, reckons costs, then picks a value. Extensibility by character set description is the issue. SGML provides for this in the SGML Declaration with character set descriptions. Systems with frozen SGML Declarations (eg. XML), make this unavailable. At this layer of SGML, XML is non-extensible as it hides the Declaration or says in effect, none exists. The extensible solution is nasty, but so is the cost of the do-overs based on limited perspectives about evolutionary requirements. It does exist insofar as XML is an SGML subset, so let's look at it. Just for reasons of historical accuracy, let's see what a text available said at the time and perhaps answer Mike's query about what SGML might provide to XML. From Martin Bryan, SGML: An Author's Guide to the Standard Generalized Markup Language. Because you bring it up, let's start with SDATA: "It is sometimes necessary to use system-specific information in the replacement text for entities. To allow receiving programs to identify expansions that they may not handle in exactly the same way on each system, the reserved word SDATA can be used to identify entity declarations containing system-specific replacement text. For example the entity declaration used to identify the AE ligature might be: <!ENTITY AElig SDATA "[AElig]" --=capital AE dipthong (ligature) --> In this case, the program will expand Æ to give [AElig] which the text formatter will recognize as the coding that generates the character AE. When this declaration is sent to another system, owever, the SDATA reserved word can be recognized and the receiving program can ask its operator to provide the coding needed to generate the relevant replacement character(s). It should be noted, however, that while characters defined as valid in the document's character set but as invalid in the document's concrete syntax can be included in SDATA entities, non-SGML characters that have been declared as unused in the document's character set cannot be specified in an SDATA declaration... (4) specific character data (SDATA) entities that contain characters whose role is specific to the local system. Where the retrieved entity contains data that is not coded in SGML, (ie, consists of non-SGML characters, non-parsable character data or system-specific information), the entity must be declared as a data entity. This is indicated by placing the appropriate reserved word (CDATA, NDATA, or SDATA) immediately after the system identifier (or the word SYSTEM if no identifier is present) followed by a compulsory notation name identifying the type of coding used within the data entity. ... When the system has finished sending the decoded data to the document, it will transmit a special, system dependent, signal known as an entity end signal to the SGML parser. This signal is output by the system at the end of each entity to tell the parser that it can continue processing the rest of the text. Note: The entity end signal is not a control code and need not be one of the codes declared within the document's character set. It can be any signal or group of signals recognized by the SGML program as an indication that the end of an entity's replacement text has been received. Where an external entity contains character data or other system-specific information, it's declaration must be qualified by a suitable notation name: <!ENTITY special SYSTEM "b:logotype.174" SDATA "logo" > <!NOTATION logo SYSTEM "logo generation subsystem" >" While SDATA is interesting in its own right, the more applicable part of the SGML Declaration is the document character set clause that enables a document to contain characters that are not defined in the document's concrete syntax. This uses the reserved name CHARSET followed by one or more character set descriptions. Again from Martin Bryan: "Each character set description consists of a base character set statement followed by a described character set portion identifying the roles of individual characters. More than one reference (base) character set can be used to build up a character set description... When using the document character set clause to create a translation table for an incoming document it is important to remember that character references to reassigned codes will also need to be changed during translation. For example, if a document prepared ... is to be transferred to an EBCIDIC-based system, an ISO 646 character reference such as $#34; in an entity declaration will need to be changed to }, the EBCIDIC code for a quotation mark." Ok, now, which parts of that are hard and expensive? Feel free to fill in details I missed. Len http://www.mp3.com/LenBullard Ekam sat.h, Vipraah bahudhaa vadanti. Daamyata. Datta. Dayadhvam.h -----Original Message----- From: Tim Bray [mailto:tbray@t...] Sent: Monday, July 16, 2001 12:45 PM To: xml-dev@l... Subject: Historical I18n Note At 08:16 AM 16/07/01 -0500, Bullard, Claude L (Len) wrote: >This is easy. SGML preserves options using the SGML >Declaration. The options have costs and require skill >to handle. XML is simpler but it removes the options. >In SGML, Blueberry is a non-issue Sorry, Len has now said this about 8 times and just for reasons of historical accuracy, I have to make the point that i18n in the SGML context was not quite the bundle of sweetness-and-light that's presented here. Anyone who's ever tried to (a) understand what an SDATA entity is, and/or (b) take a file full of them produced by Vendor A and try to figure out how to get them rendered on screen or paper by software from another vendor will know what I'm talking about. SGML handles these issues *in principle* fully & completely by abstracting away the notion of a character. SGML handled a lot of issues in principle. XML's decision to say "a character is an atomic unit of text as defined by Unicode, and you have to support at least these 2 bit encodings" has less abstract beauty but it's there for a reason, and it buys a huge amount of real-world interoperability that no previous markup-language system, including SGML, ever came close to. -Tim
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|