Re: SDATA or UNICODE

From: "Rick Jelliffe" <ricko@a...>
To: "'xml development mailing list xml-dev'" <xml-dev@i...>
Date: Thu, 29 Jan 1998 16:07:23 +1100

Play the video

> From: Paul Prescod <papresco@t...>

> On Wed, 28 Jan 1998, Gavin McKenzie wrote:
> > 
> > XML provides a way for specifying the encoding of an entity with the
> > ?XML pi encoding declaration.  Why wouldn't this be sufficient.  If the
> > euro or florin symbol is available in some non-Unicode character
> > encoding scheme, isn't it sufficient to encode the text which requires
> > the symbol in the appropriate scheme and use the encoding declaration?
> 
> No, for the reason Tim points out. On the other hand, you might be on the 
> right track. A processing instruction would serve as a hack to tell the 
> application where to insert the euro. <?EURO>

XML has, underlying its decisions, the SGML model which separates the
encoding of data (i.e. "storage management") from their logical representation
as streams of characters in a single character set (i.e. "entity management").

This is a very flexible model, since it allows any system of encoding that
anyone can dream up to be used without having to alter XML/SGML: an entity
can be sourced from files, multipart MIME, data base, random number generators,
standard input, anything.  To allow multiple encodings within an XML
file, delimited using PIs or elements or internal entities would violate
this model, and I would strongly recommend against it. If your customers
require multiple encodings, then they have to source each one from a separate
external entity. These entities can be bundled up or interleaved in any
fashion you like, but this is a *PRE* XML storage management issue, not
an XML issue. 

I think there is a great desire that XML will be a Trojan horse to force
the development of wide-character applications, and Universal Character 
Set-using ones (UCS = ISO 10646 ~= Unicode) in particular. 
I, for one, hope that by disconnection encoding and character "repertoire", 
XML will marginalise the character encoding issue to the extent that 
it will become easier to use Unicode than to use a regional encoding, 
in the long run.

> I think you should implement a language that allows this and is preprocessed 
> into XML. If I were you I would use marked sections and not attributes to 
> describe the boundaries. Marked sections are really easy to scan for.

But once you have changed encodings, do you scan for the end of the
marked section using the old or the new encoding? These kinds of ISO 2022
mode changing are what we are trying to get rid of from XML (and from
SGML).

So you can have multiple encodings before the parser, but not being presented
to the parser. The other choice is multiple encodings after the parser: e.g.
embedded the SJIS encoded in a latin-1-safe way. This is the same as Dave's 
comment about transliteration using notation. You can have a document like

<?XML version="1.0" encoding="8859-1"?>
<!DOCTYPE x SYSTEM "x.dtd"
[
	<!NOTATION sjis-Qencoded SYSTEM "SjisQ.pl">
	<!ELEMENT SJIS-SECTION ( #PCDATA ) >
	<!ATTLIST SJIS-SECTION
		I-need-decoding NOTATION ( sjis-Qencoded ) > 
]>
<x>
...

<SJIS-SECTION><![CDATA[
smdkfjhhjwfnnweofijslkdm
]]></SJIS-SECTION>
...
</x>

(You cannot do the same thing using internal entities in XML, since you 
cannot put a notatation on an internal entity declaration.)

Rick Jelliffe

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)

Follow-Ups:
- Re: SDATA or UNICODE
  - From: Paul Prescod <papresco@t...>

Prev by Date: Re: XSL/XML/XLL and VRML (was: Re: Conditional actions in XSL?)
Next by Date: Re: SDATA or UNICODE
Previous by thread: RE: SDATA or UNICODE
Next by thread: Re: SDATA or UNICODE
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >