[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: US-ASCII characters versus XML characters ... why such a
At 2012-10-01 13:59 +0000, Costello, Roger L. wrote: >Below is a table that shows the US-ASCII characters (decimal value) >in the left column and the right column indicates whether the >character is allowed in XML documents. > >Questions: > >1. Why does XML not support many of the US-ASCII characters? Because it is rooted in SGML which is a text-based data description language and the characters allowed by XML 1.0 are characters typically found in raw text files (that is, text files that do not have control characters for directives such as printer control). Who needs more than CR, LF and tab when typing raw text? >Of the 127 US-ASCII characters, 28 characters are not allowed in XML >documents; that is, 22% of the US-ASCII characters are not supported by XML. Ref: http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Char http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Char >2. I am creating an XML Schema for an RFC that allows all 127 >US-ASCII characters. XML Schema does not constrain character sets. To express constraints on an XML document use: ISO/IEC 19757-7:2009. Information technology -- Document Schema Definition Languages (DSDL) -- Part 7: Character Repertoire Description Language But CREPDL does not *add* anything to XML, it only constraints the characters allowed by XML. >What should I do for the 28 US-ASCII characters that are supported >by the RFC but not supported by XML? Use XML 1.1 and you get all characters except NUL. One of my clients needed control characters in a text result in order to control a legacy agate printing system for sports scores in newspapers (the tiny print of baseball box scores for example). Thankfully, NUL is not one of the control characters needed. I wrote XSLT that produced all of the needed control characters and the results are made public here: http://sportsmlt.svn.sourceforge.net/viewvc/sportsmlt/2.0/support/sportsmlt2-character.xsl?&view=markup The top-most stylesheet fragment is here: http://sportsmlt.svn.sourceforge.net/viewvc/sportsmlt/2.0/sportsmlt2.xsl?view=markup The HTML documentation for the stylesheet library is here, produced by my XSLStyle documentation methodology: http://sportsmlt.svn.sourceforge.net/viewvc/sportsmlt/2.0/sportsmlt2.html Using the above stylesheets if I want hex 0x01 to be represented in my XML or my XSLT I use the entity reference ￑. As a sequence of 8 characters it is not, itself, an invalid Unicode character, rather, it is only represented as such in memory. The raw invalid Unicode character never shows up in any file which would then make the Unicode file invalid. I hope this helps. . . . . . . . . . . Ken -- Contact us for world-wide XML consulting and instructor-led training Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm Crane Softwrights Ltd. http://www.CraneSoftwrights.com/x/ G. Ken Holman mailto:gkholman@CraneSoftwrights.com Google+ profile: https://plus.google.com/116832879756988317389/about Legal business disclaimers: http://www.CraneSoftwrights.com/legal
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|