Re: converting character entities to us-ascii /equivalents/

To: Michael Kay <michael.h.kay@n...>
Subject: Re: converting character entities to us-ascii /equivalents/
From: Robert Koberg <rob@k...>
Date: Wed, 06 Oct 2004 15:55:26 -0700
Cc: "'XML Developers List'" <xml-dev@l...>
In-reply-to: <20041006224309.TQZP20770.mta10-svc.ntlworld.com@Turtle>
References: <20041006224309.TQZP20770.mta10-svc.ntlworld.com@Turtle>
User-agent: Mozilla Thunderbird 0.7 (Macintosh/20040616)

Play the video

Michael Kay wrote:

> If there's a limited number of non-ASCII characters you need to handle, you
> can use character maps in the XSLT 2.0 serializer.

Sorry, I should have specified that I am using v1.0. I intend to move to 
v2.0 sometime soon, but have not had the time to learn it and convert 
all of my stylesheets.

But that is good to know.

thanks,
-Rob

> 
> Michael Kay
> http://www.saxonica.com/
> 
> 
>>-----Original Message-----
>>From: Robert Koberg [mailto:rob@k...] 
>>Sent: 06 October 2004 22:56
>>To: XML Developers List
>>Subject:  converting character entities to us-ascii 
>>/equivalents/
>>
>>Hi,
>>
>>I need to output several versions of a page (through XSL 
>>transformations), one of which is us-ascii (for email). But, 
>>the content 
>>might contain some characters that are not supported by 
>>us-ascii (like 
>>em dash - &#151;).
>>
>>I want the character entities to remain in the content. When 
>>transforming to us-ascii, I want to replace the entities with 
>>some ascii 
>>text equivalent: For example, '&#151;' would get converted to '--'.
>>
>>The XML is pulled into the transformation through the 
>>document function 
>>using a custom URIResolver.
>>
>>Is there an existing solution to this?
>>
>>Does Apache's FOP and the text renderer handle this type of thing?
>>
>>I have tried to set a ContentHandler (actually a 
>>DefaultHandler) on the 
>>XMLReader and tried to replace a character entity, but I am doing 
>>something wrong and a confused on how to proceed. Using the 
>>code below I 
>>get a recoverable error using saxon/aelfred and a failure when using 
>>saxon/xerces.
>>
>>Here is a snippet from the URIResolver:
>>
>>
>>InputSource in = new InputSource(file.getAbsolutePath());
>>SAXSource source = new SAXSource(in);
>>XMLReader reader = null;
>>try {
>>   reader = 
>>XMLReaderFactory.createXMLReader("com.icl.saxon.aelfred.SAXDriver");
>>   //reader = 
>>XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SA
>>XParser");
>>} catch (SAXException e) {
>>   System.err.println(e.getMessage());
>>}
>>
>>reader.setContentHandler(new AsciiHandler());
>>
>>source.setXMLReader(reader);
>>
>>return source;
>>
>>
>>
>>And the DefaultHandler has one method:
>>
>>
>>public void characters(char[] text, int start, int length) {
>>
>>   String str = new String(text, start, length);
>>   if (str.indexOf(174) > -1) {
>>    str.replaceAll("\u00AE", "(Registered Trademark)");
>>   }
>>   text = str.toCharArray();
>>}
>>
>>How can I do this? Is there a better way to handle this type of thing?
>>
>>thanks,
>>-Rob
>>
>>
>>-----------------------------------------------------------------
>>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>initiative of OASIS <http://www.oasis-open.org>
>>
>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>>To subscribe or unsubscribe from this list use the subscription
>>manager: <http://www.oasis-open.org/mlmanage/index.php>
>>
>>
>

References:
- RE: converting character entities to us-ascii /equivalents/
  - From: "Michael Kay" <michael.h.kay@n...>

Prev by Date: RE: converting character entities to us-ascii /equivalents/
Next by Date: XSL help is required.
Previous by thread: RE: converting character entities to us-ascii /equivalents/
Next by thread: Re: converting character entities to us-ascii /equivalents/
Index(es):
- Date
- Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >