[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] XML parsing @ 100MB-1000MB/sec/GHz with Parallel Bit Streams
I am pleased to announce the availability of parabix-0.40, a high-performance XML parsing engine prototype that can parse text-oriented XML document on commodity processors at over 200MB/sec per processor GHz and data-oriented XML documents at speeds approaching that. At this point, this includes correct parsing of correct documents and dispatch to markup action routines using an in-line API for XML (ilax). As the parabix stack is built out to incorporate validation and object creation, I am expecting overall performance above 100MB/sec/GHz. With linear speed-up on multicore processors and other improvements, 1000MB/sec/GHz is forseeable. By way of comparison, XML Screamer (Koustalas et al, WWW 2006) performs parsing, validation and business object creation on commodity processors at the rate of 23-46 MB/sec per processor GHz (MB/sec/GHz), a substantial increase over the cited rate of 2.5-6 MB/sec/GHz for traditional validating parsers. This is very good performance for traditional character-at-a-time parsing, taking advantage of a collection of techniques such as optimization across layers and schema-based customization. As a benchmark, 100 MB/sec/GHz is cited as the limit on throughput achievable for a simple character-at-a-time scanning loop. My research is investigating the development of very high-speed text processing based on a fundamentally new approach: using parallel bit streams to represent character data and the SIMD processor capabilities of commodity CPUs to process these bit streams. I have first applied these techniques to the problem of UTF-8 to UTF-16 transcoding, to achieve end-to-end speed-up of 3X to 25X compared with standard iconv and similar implementations. The open source implementation of u8u16 is available at http://u8u16.costar.sfu.ca/ and the results have just been presented to ACM PPoPP 2008 in Salt Lake City. Parabix (parallel bit streams for XML) is a research prototype that is nevertheless being designed to become the basis for a full XML processing stack. The working code repository is now available as an open source code base under OSL 3.0. http://parabix.costar.sfu.ca/ I am hoping to accelerate development of parabix technology through the open source model as well as continuing the academic research project with a team of graduate students who are coming up to speed. I have also created a spin-off company to oversee commercial development of the technology. However, in the context of discussion of XML performance issues and the next ten years of development of XML technology, I think that the work is sufficiently well advanced to support the following advice: Do not assume that XML processing performance is inherently limited by the nature of present-day character-at-a-time parsing technology. Intraregister and intrachip parallelism hold out a realistic promise of dramatic performance improvement on commodity processors. -- Robert D. Cameron, Ph.D. Professor of Computing Science, Simon Fraser University President and CTO, International Characters, Inc.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|