[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Handling of special characters like © etc

Subject: Re: Handling of special characters like © etc
From: Mike Brown <mike@xxxxxxxx>
Date: Thu, 3 May 2001 02:24:45 -0600 (MDT)
textpad special character
Yogesh Dare wrote:
> <?xml version="1.0"?>

Encoding is, roughly, the mapping of a repertoire of abstract characters 
(units in a script for written language) to 1 or more code units (bytes, 
usually). Your XML file exists with some kind of encoding, because it is, 
after all, just a bunch of bits & bytes.

The encoding declaration in an XML document (the encoding="foo" part of
the <?xml ...?> line at the top) is an XML document's way of stating what
encoding it has. When you omit the encoding declaration, either UTF-8 or
UTF-16 are assumed, usually UTF-8.

>       © 2000 site.com

The copyright symbol is allowed in XML, but since you have implied that
your document is probably UTF-8 encoded, that symbol must be encoded as
the pair of bytes 0xC2 0xA9.

If this is giving you problems, then your file is not really UTF-8 
encoded, and this is an error. Chances are, it is encoded as just the byte 
0xA9, because your file was produced with iso-8859-1 or windows-1252 
encoding. You should get a text editor that saves in different encodings, 
rather than just your platform/OS default, and that has a hex mode so you 
can see the actual bytes in the file. I use TextPad, from 
http://www.textpad.com/

If you don't want to put the correct bytes in your file, you can either
correctly declare the encoding as iso-8859-1 or windows-1252, or you can
use &#169; or &#xA9; in your XML and XSLT documents, rather than the raw
characters.

> Now after parsing, the parser output is given to XSLTProcessor to apply xsl
> on it.But there again I face problem for characters like &,<,> etc.
> Well I can actually replace these known characters by there equivalents like
> for & i can put &amp; and so on.
> But I want some generic way to handle this.

& and < (and >, for balance) are XML markup characters. If you are using
them as character data, you must either escape them, or put them in a 
CDATA section, if one is allowed there. This is a requirement of all XML 
documents, including your source XML and the stylesheet.

   - Mike
_____________________________________________________________________________
mike j. brown, software engineer at  |  xml/xslt: http://skew.org/xml/
webb.net in denver, colorado, USA    |  personal: http://hyperreal.org/~mike/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.