Subject: Saxon and ZWNJ
From: Mohsen Saboorian <mohsens@xxxxxxxxx>
Date: Mon, 10 Jun 2013 02:12:20 +0430
|
Hi,
I'm trying to evaluate an XPATH expression with saxon-9.1.0.8 using
the following code snippet:
Configuration conf = new Configuration();
conf.setValidation(false);
Processor p = new Processor(false);
DocumentBuilder documentBuilder = p.newDocumentBuilder();
XPathCompiler xpathCompiler = p.newXPathCompiler();
XPathExecutable xpe = xpathCompiler.compile(expression);
XPathSelector xpath = xpe.load();
xpath.setContextItem(documentBuilder.build(new
DOMSource(cleanHtml.document)));
XdmItem result = xpath.evaluateSingle();
The HTML is in Persian script (whose cleaned DOM is passed as
cleanHtml.document in the above code) which has ZWNJ (U+200C) not
escaped.
The matched XdmItem has ZWNJ (U+200C) (non-escaped) but when obtaining
result.getStringValue(), the result has escaped ZWNJ as (‌) which
doesn't seem to be correct because I'm getting node 'string' value.
Is this a bug, or is there any flag to disable escaping special
Unicode characters in saxon?
Regards,
Mohsen
|