[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Character encoding questions

  • From: Eric Baatz - Sun Microsystems Labs BOS <ebaatz@b...>
  • To: xml-dev@i...
  • Date: Wed, 25 Jun 1997 15:24:58 -0400 (EDT)

encoding iso 10646 ucs 2
I was struck by the following sentence in the Microsoft XML White Paper:

  XML supports a range of encodings...subject only to the restriction
  that an entire document must share the same encoding.
  
My immediate reaction was that that wasn't correct, although the
definition of "document" above isn't obvious to me (for example, are
external entities part of a document?).  However, when checking into the
XML April specification, I got in over my head.  I am hoping that someone
here will help me out of my hole.

If my XML document is a simple Unicode text file then I begin it like
the following

  a Byte Order Mark
  <?XML version="1.0" encoding="ISO-10646-UCS-2"?>
  ...

with the Byte Order Mark being required even though an EncodingDecl is
used?  (I would have said "yes" until I got to Appendix E "Autodetection
of Character Sets," which worries about detecting UCS-2 when there
is no Byte Order Mark.)  Is the EncodingDecl necessary if the file
starts with a Byte Order Mark?

Where can I have an EncodingPI?  Section 4.3.3 talks about their being
"at the beginning of a system entity, before any other character data or
markup" but doesn't define "system entity" (perhaps one that has an
ExternalID that contains "SYSTEM"?).  If my document references an
external entity, then I believe that the external entity must start
with an EncodingPI (see Appendix E "Autodetection of Character Sets")
if it isn't in UTF-8 or start with a Byte Order Mark.

If I wanted to take the external entity and, for portability reasons,
bundle it into my XML document as an internal entity, what do I do with
the external entity's EncodingPI?  It doesn't seem to be allowed in the
internal entity declaration, somewhat like:

  <!ENTITY Pub-Status <?XML encoding="ISO-10646-UCS-2"?>"text here">
  
I presume that the answer is that I cannot convert an external entity
into an internal unless the external entity and my XML document have the
same encoding.

What is the motivation for not allowing a change of encoding within
an entity?  The mechanism for handling that seems no different than
that needed to handle different encodings in external entities, which
I think of as being logically a part of the referencing document.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@i... the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.