[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] encoding and NCRs; source doc as SAX events in JAXP (w
This discussion was mostly resolved off-list, but for the sake of the archives, and because some of us can't get enough of clearing up character encoding misconceptions... Elizabeth Barham wrote: > [re: "?" substitutions for unencodable or undecodable characters] > Is it possible to bypass this mechanism? It's a feature of the codec that is doing the encoding or decoding. If you're invoking it yourself, then sure, you may have other options such as raising an exception or ignoring the unknown character or byte sequence. It depends on the API of the codec. (I'm trying to speak in relatively language neutral terms here) > I would like to pass a byte > into Java and not have it modified in anyway. XML manifests in an encoded form (bytes) for the purposes of network transmission and disk storage, but *parsed* XML is no longer treated as bytes -- instead, it is treated as Unicode string objects arranged in a logical hierarchy (elements, attributes, etc.), and this info is communicated to the application (your "Java") as either SAX event calls or a DOM Document object. A numeric character reference like "©" manifests in the encoded, 'physical' document as a series of bytes for each character (e.g. if it is UTF-16LE encoded, "&" is 0x00 0x26, "#" is 0x00 0x23, and so on). When the bytes are decoded by the parser, they become a Unicode string consisting of the 6 characters: ampersand, number sign, digit 1, digit 6, digit 9, semicolon. The parser recognizes this markup as longhand for the single Unicode character: copyright symbol (Unicode character number 169), so that's what it reports to the application. Your problem is most likely fixable with a very simple change to one line of your application's code, and knowing what to fix will be possible when you fully grasp the XML processing model and the underlying character encoding model, as well as the nuances of your application platform's codec APIs. I'd like to help further, but you'll need to boil it down to a simple bit of code that reproduces the error so I can see exactly what's going on. Off-list, please. > But, I *do* have an XSLT question to ask as and addendum. What is the > best way to drive the xml input of an XSLT formatter from inside a > java class? > > For example, let us say that I have an XSLT stylesheet that is set up > to expect a certain format, and I have a java class whose data I would > like to have processed by said stylesheet. It seems a waste to make a > StringBuffer of things like "<?xml version='1.0'?><doc><t>x</t></doc>" > and then pass it into the transformer since it would be possible to > generate the SAX events from within the Java class. > > Looking at javax.xml.parsers.SAXParser, I notice the parse() function, > but those seem to be dealing with incoming streams and not events. Yes, parsers generally rely on their input being bytes, which is implicitly mandated by the XML 1.0 spec. Convenience APIs have emerged over the years, operating at various levels, to accept different kinds of input (URIs, pre-decoded Unicode streams, DOM objects) but they typically all end up converting these to bytes, behind the scenes, for the underlying parser's benefit. [expat's, at least...] For transformations, I *think* you can generate your source document as SAX events that a JAXP application can utilize, but I'm not sure if you can just create a parser and start calling handler methods, or if you have to implement an XMLReader, or what. Maybe someone else can provide an example of how to do it? I've never tried it myself. Note that your XSLT processor might come with some helpful examples, e.g. examples/java/TraxExamples.java in Saxon's distribution .zip. As for whether it's better than marshalling your object's data into XML markup, I'd give some consideration to the maintainability and scalability of your code. Generating markup is going to be easier to understand, problems with it are going to be easy to diagnose, its output will be more widely useful, and it will probably not be all *that* much slower, in typical use cases, than marshalling into a series of SAX events. Just my 2c. -Mike XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|