Re: Pushing all the buttons
At 8:33 AM -0400 9/21/03, David Megginson wrote: >I think that James was talking about going from bytes representing a >Unicode character encoding, not a binary encoding. There should be no >platform dependencies in that case. I understood that, and my point still holds. There are platform dependencies in this case. If the native char and string types are built on UTF-8 (Perl, maybe?) then this is straightforward., However, when the native char and string types are based on UTF-16 a conversion is necessary. Ditto for UTF-16BE to UTF-16LE and vice versa. Or UTF-8/UTF-16 --> UTF-32. Languages and platforms do not share the same internal representations of Unicode. No one binary format will work for everyone. This conversion is non-trivial too. In the current version of XOM I made deliberate decision after profiling to store internal text node data in UTF-8 rather than UTF-16. That saves me a *lot* of memory. However, the constant conversion to and from the internal UTF-8 representation to Java's UTF-16 representation imposes about a 10% speed penalty. I chose to optimize for size instead of speed in this case, but I wouldn't suggest imposing that cost on everyone by making all XML data UTF-8. -- Elliotte Rusty Harold elharo@m... Processing XML with Java (Addison-Wesley, 2002) http://www.cafeconleche.org/books/xmljava http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format