Re: Re: Character 150 withs Windows-1252 output
On 4/21/06, Michael Kay <mike@xxxxxxxxxxxx> wrote: > > > Reading around a bit 150 is a control character... so does > > that mean it shouldn't appear in source XML document > > (unresolved) where the encoding is specified as ISO-8859-1 ?? > > I believe that in the ISO standard ISO 8859/1, the control blocks C0 and C1 > (which includes 150) are unused - they are not part of the character set. > However, according to Wikipedia , "the character map ISO_8859-1:1987, > more commonly known by its preferred MIME name of ISO-8859-1 ... assigns the > C0 and C1 control characters to the code values 00-1F, 7F, and 80-9F. > > The XML recommendation defines encodings in terms of their IANA definitions > not their ISO definitions, so on that basis ISO-8859-1 does include the > control character 150. > > In XML 1.1, there is a requirement that C0 and C1 characters (with obvious > exceptions such as TAB) must be represented as character references. This is > primarily to catch the common error where a Windows 1252 file is mislabelled > as ISO-8859-1. > >  http://en.wikipedia.org/wiki/ISO_8859-1 Thanks for the info. Based on that, given this stylesheet: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output encoding="ISO-8859-1" method="xml"/> <xsl:template match="/"> <foo>–</foo> </xsl:template> </xsl:stylesheet> The output differs between MSXML 3/4, Saxon 6.5.4 and Saxon 8.7.1. The latter escapes the character back to #150, while the 3 xslt 1.0 processors all output the character itself. I'm guessing this is due to xml 1.1 support in Saxon 8.7?
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format