[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Binary XML == "spawn of the devil" ?
On 25 Jul 2003 07:11:51 +1000, Rick Marshall <rjm@z...> wrote: > if it's any help there was a similar debate about storing data in > database systems years ago - do you store data as binary - integers, > floats, etc - or text. XML per se has no notion of integers, floats, dates, etc. ... you need to apply a schema to infer that. The current "binary XML" schemes that Elliotte ranted about mostly require a schema to work (causing all the "tight coupling problems" that we love to discuss here), and don't AFAIK get huge advantages for the very reasons you mention. "Binary XML" is definitely a bad term, if not an oxymoron IMHO, because it implies that it is about "compiling" schema-valid XML documents into architecture-specific formats. The problems with that would sound like a catalog of XML-DEV permathreads! I think a better way to think about at least what *I* am interested in is "performance-optimized Infoset serialization." (POIS?) That could cover a lot of possibilities, and potentially could be essentially textual but faster to parse. > i agree with the slow down from parsing xml. it's a much bigger problem, > than binary or text formats. the need to find the other end of a tag > before you can really process a tag - and searching for multi byte > sequences is not well supported in the current generation of processors > - is i think the main problem. Absolutely! I'm not sure that exactly the bottlenecks are across the board, and for all I know using something like "}" to denote the end of an element could speed things up so that the multi-byte comparison isn't necessary. I also hear repeatedly that the Unicode encoding/decoding step is a real bottleneck and that something as simple as sending around UCS characters rather than UTF codepoints can make a lot of difference. LOTS of profiling would need to be done before an alternate serialization should be standardized, of course. > > perhaps we can get intel to design multi byte search instructions into > their next processor and then we can get performance back. Well, there are people out there building XML support into hardware, at the box level, board level, and chip level. There might be some synergies between the hardware stuff and the "efficient serialization" stuff, and further synergies if the downstream processing (e.g. XSLT) can be speeded up by working of something other than raw XML 1.0 text. See, for example, http://www.sarvega.com/sarvega.php?id=1.4 , especially their "specialized data stream called XML EventStream to provide a highly optimized pipeline- processing model for XML Processing." Standardizing some more efficient serialization of the Infoset could (again if the numbers actually work out, which remains to be seen) allow interoperability between specialized hardware devices that parse/serialize between "POIS" and XML, and software (e.g. front ends to XSLT engines or "POIS" -> SAX event filters). Without standardization of some faster Infoset serialization, all this stuff works only for those who will stay within a single vendor's castle. Anyway, the point here is simply that "Binary XML" covers all sorts of territory, from just a standard serialization for SAX events to a full- blown strongly-typed object serialization format, and probably intersects with ASN.1 along the way.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|