|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: converting character entities to us-ascii /equivalents/
If there's a limited number of non-ASCII characters you need to handle, you can use character maps in the XSLT 2.0 serializer. Michael Kay http://www.saxonica.com/ > -----Original Message----- > From: Robert Koberg [mailto:rob@k...] > Sent: 06 October 2004 22:56 > To: XML Developers List > Subject: converting character entities to us-ascii > /equivalents/ > > Hi, > > I need to output several versions of a page (through XSL > transformations), one of which is us-ascii (for email). But, > the content > might contain some characters that are not supported by > us-ascii (like > em dash - —). > > I want the character entities to remain in the content. When > transforming to us-ascii, I want to replace the entities with > some ascii > text equivalent: For example, '—' would get converted to '--'. > > The XML is pulled into the transformation through the > document function > using a custom URIResolver. > > Is there an existing solution to this? > > Does Apache's FOP and the text renderer handle this type of thing? > > I have tried to set a ContentHandler (actually a > DefaultHandler) on the > XMLReader and tried to replace a character entity, but I am doing > something wrong and a confused on how to proceed. Using the > code below I > get a recoverable error using saxon/aelfred and a failure when using > saxon/xerces. > > Here is a snippet from the URIResolver: > > > InputSource in = new InputSource(file.getAbsolutePath()); > SAXSource source = new SAXSource(in); > XMLReader reader = null; > try { > reader = > XMLReaderFactory.createXMLReader("com.icl.saxon.aelfred.SAXDriver"); > //reader = > XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SA > XParser"); > } catch (SAXException e) { > System.err.println(e.getMessage()); > } > > reader.setContentHandler(new AsciiHandler()); > > source.setXMLReader(reader); > > return source; > > > > And the DefaultHandler has one method: > > > public void characters(char[] text, int start, int length) { > > String str = new String(text, start, length); > if (str.indexOf(174) > -1) { > str.replaceAll("\u00AE", "(Registered Trademark)"); > } > text = str.toCharArray(); > } > > How can I do this? Is there a better way to handle this type of thing? > > thanks, > -Rob > > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://www.oasis-open.org/mlmanage/index.php> > >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








