Re: Blueberry/Unicode/XML

From: Rick Jelliffe <ricko@a...>
To: xml-dev@l...
Date: Tue, 10 Jul 2001 21:00:43 +0800

Play the video

 > However, I presume there was a good reason why the current name character
> scheme was implemented.  The reasons I can think of are easily dismissed
or
> dealt with.  Are there any other more serious implications?

Yes.

0) To catch markup errors.  If someone is using a symbol, they must think
they are in content, marked section, attribute value, comment or a PI. This
way, the error will be detected closer to its source.

1) Catch transcoding problems. Japanese experts report this is good.

2) Promote readable (and read-aloud-able) markup. Symbols are not
appropriate for markup characters. (If you want to use symbols for markup
characters, use ISO SGML's short references or Simon St.Laurent's Regular
Fragmentations.)

3) Minimize the chances of delimiter clashing in languages (hence no ' in
names even if it would be useful for French [' the ASCII character that is])

4) Prevent the use of symbols which also have similar glyphs that are
letters. For example, the maths x or the maths alpha are symbols, and they
have distinct codes from the letters.  If you used the maths x for whatever
reason, the document might be visually correct on inspection, but be coded
wrong: your name comparisons would not work.

5) There is every chance that people will only use their native scripts when
the document is used regionally or locally. Prudence dictates it. A Cyrillic
user can assume that other readers of the document have Cyrillic alphabet
fonts, but she cannot assume which symbols may be used.

6) There are several function characters and characters which are not
suitable for markup: for example the new language-tagging characters from
plane 17 and
perhaps the BIDI characters even. (See Unicode TR 20.)  If we keep them out,
we prevent people trying to do strange things; people will always insist
that there is no intent behind a technology and that they can get by just on
the mechanism, so the mechanism needs to hard-code the intent.

7) Because it is not really very expensive to implement. But just allowing
any surrogate without nitpicking is fine.  But I agree with James Clark that
perhaps name checking (and normalization) should be some kind of different
layer to WF ultimately.   (I note that James has been thinking about this
issue almost as long as anyone:  he implemented native-language markup
capabilities in SP early on, probably the first person to release a
markup-based application that treated other scripts with equity.)

Cheers
Rick Jelliffe

Follow-Ups:
- Re: Blueberry/Unicode/XML
  - From: John Cowan <cowan@m...>
- Layering (was RE: Blueberry/Unicode/XML)
  - From: Leigh Dodds <ldodds@i...>

References:
- Presumption of XML's Stability (was RE: XML Blueberry (non-ASCII namecharacters in Japan))
  - From: Mike.Champion@S...
- Blueberry/Unicode/XML
  - From: Tim Bray <tbray@t...>
- Re: Blueberry/Unicode/XML
  - From: James Clark <jjc@j...>
- Re: Blueberry/Unicode/XML
  - From: Rob Lugt <roblugt@e...>

Prev by Date: Re: Blueberry/Unicode/XML
Next by Date: No Subject
Previous by thread: Re: Blueberry/Unicode/XML
Next by thread: Re: Blueberry/Unicode/XML
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >