[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Pushing all the buttons


byte pushing
Mike Champion wrote:

>As best I know, the big win for truly binary XML
>serializations is in avoiding the overhead of the
>Unicode-encoded text to UCS-character translation. 
>Does anyone take issue with the assertion that the
>external encoding-> Unicode text translation is
>generally a significant portion of XML parsing time?  
>  
>
Yes?  Transcoding ASCII, ISO8859-1 or UTF-16 is just a cast;
translating UTF-8 is a tiny automaton, easily enough to fit into
a data cache; translating most 8-bit sets needs only a 94 byte table.
There is nothing intrinsic to any of them that should make them
slow, the code to do them could fit into instruction caches on CPUs
(which is surely what people who want speed should be concentrating on:
what is the most functionality that a standard can prescribe that still
fits into caches):  it reckon it should be more an API/implementation 
issue.*

Java 1.4 NIO has completely revised their character transcoding:
you can have transcoders that autodetect, so I don't know why
someone doesn't put out an XML-autodetecting transcoder, which
would operate directly on, for example, external byte buffers. That
could give much nicer streaming performance.  (Anyone have any
benchmarks for NIO b.t.w.?)

The CJK sets, EBCDIC, perhaps encodings with ordering requirements such
as Thai, and older sets which need normalization are a different matter:
they are not casts, simple automata nor little tables. But removing these
from XML will not result in any extra capability for users: if you need 
speed,
send easy data.

Cheers
Rick Jelliffe

* For example, I found that IBM's ICU4J normalization class was way too 
slow
when  presented with ASCII data; but a trivial matter to bypass.


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.