[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] A SAX TransformerHandler encoding question
Hi, I've get some interesting problems with JDK's (1.4 and 1.5) TransformerHandler and surrogate pairs...: Consider: public void testOut() throws Exception { ByteArrayOutputStream out = new ByteArrayOutputStream(); SAXTransformerFactory stf = (SAXTransformerFactory) SAXTransformerFactory.newInstance(); TransformerHandler th = stf.newTransformerHandler(); th.getTransformer().setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); th.setResult(new StreamResult(out)); th.startDocument(); th.startElement("", "foo", "foo", new AttributesImpl()); char c[] = "\udc00\ud800".toCharArray(); th.characters(c, 0, c.length); th.endElement("", "foo", "foo"); th.endDocument(); byte bytes[] = out.toByteArray(); for (int i = 0; i < bytes.length; i++) { System.out.println(i + ": " + bytes[i] + " " + ((char)bytes[i])); } } This yields: 0: 60 < 1: 102 f 2: 111 o 3: 111 o 4: 62 > 5: -19 ? 6: -80 ? 7: -128 ? 8: -19 ? 9: -96 ? 10: -128 ? 11: 60 < 12: 47 / 13: 102 f 14: 111 o 15: 111 o 16: 62 > That is, the surrogate pair has been serialized as two separate unicode characters. It seems that this problem is old (see <http://issues.apache.org/jira/browse/XALANJ-2132>), so why does it still occur in recent JDKs? Best regards, Julian
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|