[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: CDATA by any other name... (was The raw and the cooked)

  • From: "Rick Jelliffe" <ricko@a...>
  • To: "XML Dev" <xml-dev@i...>
  • Date: Wed, 4 Nov 1998 06:10:32 +1100

RE: CDATA by any other name... (was The raw and the cooked)


> From: John Cowan

> Rick Jelliffe wrote:
>
> > (An optimistic view of ISO10646: there are dozens of new Han ideographs
> > created every day, apart from other scripts.)
>
> True but irrelevant, since no specifiable character set can hold these.

Not so. The additions are use composed of standard radicals and
combinations. There are various projects around (such as C.C.Hsieh in
Taiwan) to figure out encodings to "spell" Han ideographs by component
radicals. This would allow any number of characters and even variant forms.
But this is not in ISO 10646 yet.

I guess the point is that John thinks that if an XML system can produce
characters which a recipient system cannot process, because it does not use
ISO 10646, that is not something that CDATA sections should be used to
address. I think his reasons are that he cannot see it in the spec. Dave M
thinks that xml:lang is appropriate. My point about CDATA elements was that
there is no standard mechanism to lock CDATA marked sections. I think a lot
of people now think that any non-ISO10646 system is for losers anyway
(except for whatever character set they use, probably).

> .. the repertoire of a language is
> a sticky wicket.  In the domain of "xml:lang='en-US'", am I to be
> forbidden to write "naïve" or "coöperate"?  How about "résumé" or
> "Québéc"?

The primary purpose of xml:lang, as far as I am concerned, should be to
convey the information lost by ISO 10646 unification: where the Japanese and
Chinese glyphs (or Polish and Russian) for a unified character differ, then
I think transcoding and unifying the characters into ISO 10646 can lose
information unless the xml:lang attribute is set. After that, xml:lang can
be used to label text for the purposes of variant character selection, and
after that for marking up the natural language.

But I am not trying to fix the repertoire of a language (TEI WSD can declare
it, though). I am just thinking about how to constrain XML documents so that
they will not contain characters which will break non-ISO10646 target
systems.

Rick Jelliffe



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.