[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Escape mechanism using release character
From: Richard Tobin <richard@c...> >> Why is it that the well known escape mechanism of using a >> release character (like '\') for escaping special characters >> (eg. '<','&') not used in XML? > >Because XML is a subset of SGML which does not use such a mechanism. > >If XML had been a new system designed from scratch, it might well have >been much simpler in many respects. On the other hand, it would >probably not have succeeded. Actually, SGML does have such a mechanism: the Markup Suppress Character. This could have been defined as "\" for XML. I think I remember Charles Goldfarb even raised this issue for XML during its development. The reasons against it include these: 1) it creates three kinds of delimiting: by CDATA sections, by entity references, and by markup suppression. XML tried to remove duplication unless there was a good reason; 2) programmers have a lot of difficulty coping with delimiters (witness the appalling support for correct delimiters in first generation XML applications); 3) HTML and almost all SGML document s do not use this mechanism, so you would be building in incompatability; 4) it creates another character with a special meaning that must be delimited: as well as & and <, parsers must look for / and people must delimit it in text. 5) the character "\" is problematic for Japanese in that the ASCII code point for that character is used for the Yen character in ShiftJIS: if we used that character, then it would rule out the class of dumb applications that just understand the ASCII codepoints delimiter recognition and pass every other byte through; 6) the character "\" is problematic in Taiwanese encodings, in that it is used as a codepoint as part of Big5 characters: if we used that character, it would rule out the class of dumb applications that just understand the ASCII code values of delimiters and pass everything else through (there is already a potential for this problem with [ and ] as used used in CDATA sections, but "\" would be far worse). 7) \ is often used in programming languages as an escape. As you might know from shell languages, double delimiting is really tricky, and if you need to triple delimit (e.g. use "\\\\" to represent "\\" to represent \ in output) it gets complicated). So it is common practise for markup languages to use different delimiter delimiters than the delimiter delimiters of the embedded language; similarly it is common for XML processing languages to use different delimiter delimiters: e.g. OmniMark uses "%" no "\" or entities. 8) Also, I think there is a good reason in that \ might encourage the view that XML documents are delimited merely to fit into a pipeline of processes: Microsoft adopted this approach for handling XML documents with CSS stylesheets in IE5, which is why &? gets treated like a processing instruction. But this is wrong behaviour; in XML data is not tailored to a process, you declare what you want. So if I say &? I do not want a processing instruction start at my output: XSL gets this very right in its approach. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|