[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: arbitrary characters in XML document?

  • From: David Brownell <david-b@p...>
  • To: Cliff Draper <cliffwd@f...>
  • Date: Thu, 02 Sep 1999 13:42:10 -0700

bad characters in xml
Cliff Draper wrote:
> 
> Hi,  I have a question about dealing with multiple character sets.
> 
> I have an application where I want to store data in XML and retrieve
> it later.  Now a good chunk of the data I want to store is coming
> straight from the user and I have little control over exactly which
> character set the user is using.  One of my users apparently tried
> using 0x98 + 0x03 as an accented 'e'; I have no idea which character
> set he used (and I don't care),

You should.  Arbitrary binary garbage isn't necessarily going to
be legal -- as happened in this case -- and even if it chances to
be legal, it's likely to come out as something that wasn't intended.

Coming out as an error diagnostic is a useful outcome ... hidden
mangling of data is as likely, and causes severe problems later on.
A diagnostic lets you fix the problems early, before they get bad.


>	 but I still want to be able to store
> it and parse it later.  When I parse it with expat with an
> encoding="UTF-8", it complains that it's not well-formed.

Probably because it isn't.


> Any ideas?

Don't permit aritrary binary data into your text.  Ensure you know
what character encoding was used, and make sure that you either 
transform that encoding to the one you're using, or switch to using
that encoding.

- Dave



> thanks,
> -Cliff Draper
>  cliffwd@f...
> 
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
> To (un)subscribe, mailto:majordomo@i... the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@i... the following message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@i...)

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.