|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: About sml and internationalization
Sean McGrath <digitome@i...> writes: > I am thinking about the issue to with allowing/disallowing > sets of Unicode characters in element type names as per XML > 1.0. > > If SML has very few special tokens > e.g. "<", "&" and whitespace, what would happen > if any character outside this teeny weeny set is > allowed in an element type name. I would say this is the way to go. And I have seen it done before, both with eight-bit charsets like latin1 andwith unicode. It gives people the ability to shoot themselves in the foot by using strange characters (my favourite is using non-breakable space in variable names in emacs lisp). But I still think it is the way to go: The parser and language can define a small set of characters as special, and just pass on whatever is between those special characters to the application. If you think about it this way, most of the charset considerations can be removed from the parser. Treat the input as a sequence of non-negative integers (which may be 7, 8 or 36 bits wide, depending on the application; if you think in C++, the parser could be a template parameterized on the character type). If an application needs to handle several charsets, it can use something like a content-type: text/sml; charset = iso-8859-2 header to convert the input into unicode before feeding it into the parser. One could define the special characters more abstractly, and leave it to the application to tell the parser how an "<" is represented today, but I think that's overabstracting things. Using plain ascii values (possibly embedded into an ascii superset like unicode or latin-2) should be good enough. This line of thinking also means that "whitespace", as far as the parser is concerned, should be limited to a few ascii characters. SPC and NL ought to be enough. To keep with tradition, perhaps TAB an CR as well. Having the parser recognize all unicode whitespace characters as adds some complexity. (There are 5 spacing control characters in traditional ASCII, and ordinary space, non-breakable space (in latin-x and unicode), and an additinal 18 in the rest of unicode. I.e 25 in all). /Niels xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To unsubscribe, mailto:majordomo@i... the following message; unsubscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








