[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: converting character entities to us-ascii /equivalents/
Michael Kay wrote: > If there's a limited number of non-ASCII characters you need to handle, you > can use character maps in the XSLT 2.0 serializer. Sorry, I should have specified that I am using v1.0. I intend to move to v2.0 sometime soon, but have not had the time to learn it and convert all of my stylesheets. But that is good to know. thanks, -Rob > > Michael Kay > http://www.saxonica.com/ > > >>-----Original Message----- >>From: Robert Koberg [mailto:rob@k...] >>Sent: 06 October 2004 22:56 >>To: XML Developers List >>Subject: converting character entities to us-ascii >>/equivalents/ >> >>Hi, >> >>I need to output several versions of a page (through XSL >>transformations), one of which is us-ascii (for email). But, >>the content >>might contain some characters that are not supported by >>us-ascii (like >>em dash - —). >> >>I want the character entities to remain in the content. When >>transforming to us-ascii, I want to replace the entities with >>some ascii >>text equivalent: For example, '—' would get converted to '--'. >> >>The XML is pulled into the transformation through the >>document function >>using a custom URIResolver. >> >>Is there an existing solution to this? >> >>Does Apache's FOP and the text renderer handle this type of thing? >> >>I have tried to set a ContentHandler (actually a >>DefaultHandler) on the >>XMLReader and tried to replace a character entity, but I am doing >>something wrong and a confused on how to proceed. Using the >>code below I >>get a recoverable error using saxon/aelfred and a failure when using >>saxon/xerces. >> >>Here is a snippet from the URIResolver: >> >> >>InputSource in = new InputSource(file.getAbsolutePath()); >>SAXSource source = new SAXSource(in); >>XMLReader reader = null; >>try { >> reader = >>XMLReaderFactory.createXMLReader("com.icl.saxon.aelfred.SAXDriver"); >> //reader = >>XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SA >>XParser"); >>} catch (SAXException e) { >> System.err.println(e.getMessage()); >>} >> >>reader.setContentHandler(new AsciiHandler()); >> >>source.setXMLReader(reader); >> >>return source; >> >> >> >>And the DefaultHandler has one method: >> >> >>public void characters(char[] text, int start, int length) { >> >> String str = new String(text, start, length); >> if (str.indexOf(174) > -1) { >> str.replaceAll("\u00AE", "(Registered Trademark)"); >> } >> text = str.toCharArray(); >>} >> >>How can I do this? Is there a better way to handle this type of thing? >> >>thanks, >>-Rob >> >> >>----------------------------------------------------------------- >>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an >>initiative of OASIS <http://www.oasis-open.org> >> >>The list archives are at http://lists.xml.org/archives/xml-dev/ >> >>To subscribe or unsubscribe from this list use the subscription >>manager: <http://www.oasis-open.org/mlmanage/index.php> >> >> >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|