RE: converting character entities to us-ascii /equivalents/

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

To: "'Robert Koberg'" <rob@k...>,"'XML Developers List'" <xml-dev@l...>
Subject: RE: converting character entities to us-ascii /equivalents/
From: "Michael Kay" <michael.h.kay@n...>
Date: Wed, 6 Oct 2004 23:44:31 +0100
In-reply-to: <416469EE.5060405@k...>
Thread-index: AcSr7xyvBMvXru//RWqZl8ojx226wQABsCqw

If there's a limited number of non-ASCII characters you need to handle, you
can use character maps in the XSLT 2.0 serializer.

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: Robert Koberg [mailto:rob@k...] 
> Sent: 06 October 2004 22:56
> To: XML Developers List
> Subject:  converting character entities to us-ascii 
> /equivalents/
> 
> Hi,
> 
> I need to output several versions of a page (through XSL 
> transformations), one of which is us-ascii (for email). But, 
> the content 
> might contain some characters that are not supported by 
> us-ascii (like 
> em dash - &#151;).
> 
> I want the character entities to remain in the content. When 
> transforming to us-ascii, I want to replace the entities with 
> some ascii 
> text equivalent: For example, '&#151;' would get converted to '--'.
> 
> The XML is pulled into the transformation through the 
> document function 
> using a custom URIResolver.
> 
> Is there an existing solution to this?
> 
> Does Apache's FOP and the text renderer handle this type of thing?
> 
> I have tried to set a ContentHandler (actually a 
> DefaultHandler) on the 
> XMLReader and tried to replace a character entity, but I am doing 
> something wrong and a confused on how to proceed. Using the 
> code below I 
> get a recoverable error using saxon/aelfred and a failure when using 
> saxon/xerces.
> 
> Here is a snippet from the URIResolver:
> 
> 
> InputSource in = new InputSource(file.getAbsolutePath());
> SAXSource source = new SAXSource(in);
> XMLReader reader = null;
> try {
>    reader = 
> XMLReaderFactory.createXMLReader("com.icl.saxon.aelfred.SAXDriver");
>    //reader = 
> XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SA
> XParser");
> } catch (SAXException e) {
>    System.err.println(e.getMessage());
> }
> 
> reader.setContentHandler(new AsciiHandler());
> 
> source.setXMLReader(reader);
> 
> return source;
> 
> 
> 
> And the DefaultHandler has one method:
> 
> 
> public void characters(char[] text, int start, int length) {
> 
>    String str = new String(text, start, length);
>    if (str.indexOf(174) > -1) {
>     str.replaceAll("\u00AE", "(Registered Trademark)");
>    }
>    text = str.toCharArray();
> }
> 
> How can I do this? Is there a better way to handle this type of thing?
> 
> thanks,
> -Rob
> 
> 
> -----------------------------------------------------------------
> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
> initiative of OASIS <http://www.oasis-open.org>
> 
> The list archives are at http://lists.xml.org/archives/xml-dev/
> 
> To subscribe or unsubscribe from this list use the subscription
> manager: <http://www.oasis-open.org/mlmanage/index.php>
> 
>

Follow-Ups:
- Re: converting character entities to us-ascii /equivalents/
  - From: Robert Koberg <rob@k...>

References:
- converting character entities to us-ascii /equivalents/
  - From: Robert Koberg <rob@k...>

Prev by Date: converting character entities to us-ascii /equivalents/
Next by Date: Re: converting character entities to us-ascii /equivalents/
Previous by thread: converting character entities to us-ascii /equivalents/
Next by thread: Re: converting character entities to us-ascii /equivalents/
Index(es):
- Date
- Thread

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >