Re: Where does a parser get the replacement text for a characterreferenc

From: David Brownell <david-b@p...>
To: Lars Marius Garshol <larsga@g...>,xml-dev <xml-dev@l...>
Date: Wed, 04 Jul 2001 18:33:39 -0700

Play the video

> | I assume that it would depend on what encoding the xml that you are
> | parsing has.
> 
> Actually, no.

More like:  "sort of yes".  Java developers tend to assume Unicode is
the universal way to represent character data, but folk working in other
languages may not be so fortunate.  Parser APIs aren't required to
transcode into a UTF (UTF-8, UTF-16, UTF-32); they may deliver
characters in other encodings, including the input encoding.

Using the original U+E311 private-use character as an example,
it could be natural to have some component transcode it to the
local character set.  That may be preferred for Klingon, or for
other characters that don't have code points in Unicode.  (A while
back, I think Taiwan needed to use that approach; dunno if that's
less of an issue in 3.1 Unicode.)

>     Character references always refer to Unicode characters.

Or surrogate pairs -- they refer to ISO-10646 characters, which can
be represented in Unicode as one or two 16-byte units.  It's explicitly
illegal to have references to surrogate pairs, but characters in the
"Astral Planes" expand to two UTF-16 characters (or one UTF-32).

- Dave

Follow-Ups:
- Re: Where does a parser get the replacement text for a characterreference?
  - From: Lars Marius Garshol <larsga@g...>

References:
- Where does a parser get the replacement text for a character reference?
  - From: Ben Ryan <b_ryan@c...>
- Re: Where does a parser get the replacement text for a characterreference?
  - From: Lars Marius Garshol <larsga@g...>

Prev by Date: Re: [Question] How to do incremental parsing?
Next by Date: Re: Where does a parser get the replacement text for a characterreference?
Previous by thread: Re: Where does a parser get the replacement text for a characterreference?
Next by thread: Re: Where does a parser get the replacement text for a characterreference?
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >