[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: How to read the encoding of an XML document

Subject: Re: How to read the encoding of an XML document
From: David Carlisle <davidc@xxxxxxxxx>
Date: Thu, 25 Oct 2001 16:53:54 +0100
xslt query encoding of document
> When you say Unicode, does that equate to UTF-8, UTF-16, UTF-32 or 
> something else?  
No unicode is essentially an abstract collection of characters, numbered
1 to x10FFFF (most of which slots are empty). an XML notation of &#333;
refers to that abstract character number 333.

However to store unicode strings in files (and other places) you need
some encoding that maps bytes in the file to these chracters. UTF-x are
some of those encodings (all UTF encodings  have the property that they can
encode the whole unicode range) other encodings such as ascii or latin-1
are similar, but can't encode the whole range of characters.

> Or does the answer depend upon the XML parser you are 
> using, which in my case is MSXML3.0?

No. Internally the parser obviously has to use some encoding to store
things (often this is utf-16, and it is in the case of msxml) in some
programming api's you need to know this as you het handed the string,
but in XSLT you never need to know what happens internally.
Your XSLT stylesheet is an XML document so it goes through the same
process.

Character data in the stylesheet is mapped to abstract unicode
characters (using the encoding specified in the stylesheet)
and the same happens for the source document. It is these abstract
characters that are compared. So by then you don't need to know (and
can't find out) what encoding the original files contained.

So your source might be in latin-2 and your stylesheet might be in
latin-1 but by the time they have both been parsed everything is in
abstract unicode characters and it is these that are compared
in any XSLT query. (In fact MSXML3 uses utf16 but this is an internal
detail that has no affect on the stylesheet)

David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet
delivered through the MessageLabs Virus Scanning Service. For further
information visit http://www.star.net.uk/stats.asp or alternatively call
Star Internet for details on the Virus Scanning Service.

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.