Re: Character 150 withs Windows-1252 output
> > Gives this result: > > > > <foo>––</foo> > > > > I've checked the input file with a hex editor to make sure the > > un-escaped dash really is 0x96. Somehow the two characters are > > treated differently, which is something I didn't expect. > > > > I think that 0x96 in the input XML read using Windows-1252 should > > become #8211 when output using any encoding other than Windows-1252, > > which is what is happening for the actual character 0x96, but the > > character reference #150 gets serialised back as #150... > > Isn't this beause – is a unicode entity? It's not a windows-1252 > entity. In other words a character entity never changes according to > the input encoding. Ahh of course, that makes sense. The character for #150 is worked out after the bytes in the document have be parsed using the encoding specified in the prolog.... So 0x96 becomes #8211 though the mapping defined in Windows-1252, and #150 remains as #150 because its a character reference and character references are always unicode. Thanks Nic!
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format