[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Special entity characters in Shift-JIS XSL.

Subject: Re: Special entity characters in Shift-JIS XSL.
From: Tony Graham <tgraham@xxxxxxxxxxxxxxxx>
Date: Wed, 15 Dec 1999 13:05:11 -0400 (EST)
xml special entity
At 15 Dec 1999 08:55 -0500, Douglas Weed wrote:
 > An application has been developed which uses the Microsoft MXSML parser
 > enclosed in a DLL to apply XSL files against an XML stream.  The encoding is
 > in Shift-JIS as the application is double byte. The net result of the
 > application is HTML.  The target browser has been developed to understand
 > certain 'special characters' or entities, which in themselves are double
 > byte.  Much in the same way &#39; maps to an asterisk.  For example
 > &#249;&#134; would yield a special 2 byte character which is a Q surrounded
 > by a circle.  If this character sequence is placed directly into a .htm
 > page, it works.  However, as I suspected, when placed within an xsl file and
 > transformed with the xml, it yields nothing since the parser tries format
 > it.  I attempted to use an in-line DTD to define the entity and use the
 > definition within the XML file, however, MSXML has some real difficulties
 > handling an in-line DTD when the XML is a character string and not a file.
 > The work-arounds specified by MS are not feasible.  The question : does
 > another technique exist to have the XSL file ignore &#249;&#134; and pass it
 > straight through to the HTML stream?  Sorry for the length of the message
 > and thanks for any responses. 

In XML, numeric character references are always to Unicode code
values.  A conforming application should recognise &#249;&134; as
LATIN SMALL LETTER O WITH STROKE followed by one of the C1 control
characters.

What comes out of your MSXML DLL almost certainly uses two bytes to
represent each character -- UTF-16 uses two bytes per character, and
UTF-8 also uses two bytes per character for character numbers in that
range.

Relying on two numeric character references to represent a double-byte
sequence is fragile, as you have found.

The numeric character reference for the Unicode character CIRCLED
LATIN CAPITAL LETTER Q is &#x24C6;.

I don't know that MSXML allows you to specify the output encoding.
However, if I'm correct in thinking that a circled Q is gaiji in
Shift-JIS, the character might be dropped in a conversion to Shift-JIS
anyway.

Regards,


Tony Graham
======================================================================
Tony Graham                            mailto:tgraham@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9632
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.