[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Character 150 withs Windows-1252 output

Subject: Re: Character 150 withs Windows-1252 output
From: "andrew welch" <andrew.j.welch@xxxxxxxxx>
Date: Fri, 21 Apr 2006 14:21:48 +0100
character 150
> > Gives this result:
> >
> > <foo>&#150;&#8211;</foo>
> >
> > I've checked the input file with a hex editor to make sure the
> > un-escaped dash really is 0x96.  Somehow the two characters are
> > treated differently, which is something I didn't expect.
> >
> > I think that 0x96 in the input XML read using Windows-1252 should
> > become #8211 when output using any encoding other than Windows-1252,
> > which is what is happening for the actual character 0x96, but the
> > character reference #150 gets serialised back as #150...
> Isn't this beause &#150; is a unicode entity? It's not a windows-1252
> entity. In other words a character entity never changes according to
> the input encoding.

Ahh of course, that makes sense.  The character for #150 is worked out
after the bytes in the document have be parsed using the encoding
specified in the prolog....

So 0x96 becomes #8211 though the mapping defined in Windows-1252, and
#150 remains as #150 because its a character reference and character
references are always unicode.

Thanks Nic!

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.