[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

converting character entities to us-ascii /equivalents/

  • To: XML Developers List <xml-dev@l...>
  • Subject: converting character entities to us-ascii /equivalents/
  • From: Robert Koberg <rob@k...>
  • Date: Wed, 06 Oct 2004 14:55:58 -0700
  • User-agent: Mozilla Thunderbird 0.7 (Macintosh/20040616)

ascii equivalents
Hi,

I need to output several versions of a page (through XSL 
transformations), one of which is us-ascii (for email). But, the content 
might contain some characters that are not supported by us-ascii (like 
em dash - &#151;).

I want the character entities to remain in the content. When 
transforming to us-ascii, I want to replace the entities with some ascii 
text equivalent: For example, '&#151;' would get converted to '--'.

The XML is pulled into the transformation through the document function 
using a custom URIResolver.

Is there an existing solution to this?

Does Apache's FOP and the text renderer handle this type of thing?

I have tried to set a ContentHandler (actually a DefaultHandler) on the 
XMLReader and tried to replace a character entity, but I am doing 
something wrong and a confused on how to proceed. Using the code below I 
get a recoverable error using saxon/aelfred and a failure when using 
saxon/xerces.

Here is a snippet from the URIResolver:


InputSource in = new InputSource(file.getAbsolutePath());
SAXSource source = new SAXSource(in);
XMLReader reader = null;
try {
   reader = 
XMLReaderFactory.createXMLReader("com.icl.saxon.aelfred.SAXDriver");
   //reader = 
XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
} catch (SAXException e) {
   System.err.println(e.getMessage());
}

reader.setContentHandler(new AsciiHandler());

source.setXMLReader(reader);

return source;



And the DefaultHandler has one method:


public void characters(char[] text, int start, int length) {

   String str = new String(text, start, length);
   if (str.indexOf(174) > -1) {
    str.replaceAll("\u00AE", "(Registered Trademark)");
   }
   text = str.toCharArray();
}

How can I do this? Is there a better way to handle this type of thing?

thanks,
-Rob


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.