[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Pushing all the buttons
Mike Champion wrote: >As best I know, the big win for truly binary XML >serializations is in avoiding the overhead of the >Unicode-encoded text to UCS-character translation. >Does anyone take issue with the assertion that the >external encoding-> Unicode text translation is >generally a significant portion of XML parsing time? > > Yes? Transcoding ASCII, ISO8859-1 or UTF-16 is just a cast; translating UTF-8 is a tiny automaton, easily enough to fit into a data cache; translating most 8-bit sets needs only a 94 byte table. There is nothing intrinsic to any of them that should make them slow, the code to do them could fit into instruction caches on CPUs (which is surely what people who want speed should be concentrating on: what is the most functionality that a standard can prescribe that still fits into caches): it reckon it should be more an API/implementation issue.* Java 1.4 NIO has completely revised their character transcoding: you can have transcoders that autodetect, so I don't know why someone doesn't put out an XML-autodetecting transcoder, which would operate directly on, for example, external byte buffers. That could give much nicer streaming performance. (Anyone have any benchmarks for NIO b.t.w.?) The CJK sets, EBCDIC, perhaps encodings with ordering requirements such as Thai, and older sets which need normalization are a different matter: they are not casts, simple automata nor little tables. But removing these from XML will not result in any extra capability for users: if you need speed, send easy data. Cheers Rick Jelliffe * For example, I found that IBM's ICU4J normalization class was way too slow when presented with ASCII data; but a trivial matter to bypass.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|