[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Well-formed Blueberry

  • From: jcowan@r...
  • To: Joel Rees <rees@s...>
  • Date: Mon, 16 Jul 2001 08:03:10 -0400 (EDT)

blueberry character
On Mon, 16 Jul 2001, Joel Rees wrote:

> However, I am having a hard time figuring out why the standard should treat
> authors of nonstandard XML documents better than people who simply want to
> use their own language in markup.

I'm confused.  What nonstandard documents?

> In a corollary point of confusion for me, you seem to assume in your
> posts that, even without your wall, a blueberry capable parser must have
> both the pre-blueberry character classification tables and the blueberry
> character classification tables. In my naive point of view, a document
> that is valid XML 1.0 ought to be valid blueberry, thus, the complete
> table should be the only necessary table, unless you want to build a
> wall. 

No, I think Elliotte is right here.  There are XML 1.0 documents, which
lack the Magic Blueberry Mark (whatever it's going to be), and then there
are Blueberry documents.  XML-1.0-only parsers MUST reject Blueberry
documents: they are not well-formed.  Blueberry parsers SHOULD accept
both Blueberry and 1.0 documents, but MUST apply the 1.0 well-formedness
rules to 1.0 documents.  If a document lacks the Magic Blueberry Mark but
contains Blueberry names, it is not well-formed and must be rejected.

Therefore, Blueberry parsers have to keep both sets of tables.  Luckily,
the Blueberry table is a strict superset of the 1.0 table, so it suffices
to have four tables (or one table that maps Unicode values to one of
four enumerated values):  xml10_name_start, xml10_name_part,
blueberry_only_name_start, blueberry_only_name_part.

Elliotte Rusty Harold writes:

> There are not that many encodings that can
> handle the Blueberry characters,
> basically just several variants of Unicode, one Japanese character set, and
> possibly a couple of Chinese character sets.

IMHO the snag here would be getting an absolutely authoritative and
permanent list of such character sets, since they would have to be
hard-coded (contrary to previous practice) into the Blueberry
Recommendation.

Limiting Blueberry to just Unicode would probably work for most of the
new (to Unicode) scripts, as you say, but would not be so good for
Han characters.


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.