|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML parsing @ 100MB-1000MB/sec/GHz with Parallel Bit Strea
Rob Cameron wrote: > I am pleased to announce the availability of parabix-0.40, a > high-performance > XML parsing engine prototype that can parse text-oriented XML document > on commodity processors at over 200MB/sec per processor GHz and > data-oriented XML documents at speeds approaching that. > [...] > By way of comparison, XML Screamer (Koustalas et al, WWW 2006) performs > parsing, validation and business object creation on commodity > processors at > the rate of 23-46 MB/sec per processor GHz (MB/sec/GHz), a substantial > increase over the cited rate of 2.5-6 MB/sec/GHz for > traditional validating > parsers. As one of the authors of the XML Screamer paper, let me say that this sounds like terrific work. I have for awhile been curious about the use of SIMD features of modern processors to go beyond what we did with XML Screamer. (It's ancient history, but years ago I spent some time performance optimizing code using the SIMD features that were then available on IBM mainframes -- the so-called Vector Factility -- and I've been interested ever since in the use of such SIMD systems to speed text processing and other systems-style code.) The performance numbers you quote are impressive, but also very consistent IMO with what one might expect, in that it was always clear that systems like Screamer were ultimately limited by the CPU speed of the processors we used rather than by the memory system, and the SIMD side of these chips should be able to use the memory system more effectively. I very much look forward to reading your PPoPP paper (I happen to be on an airplane just now, and therefore can't get to an the Internet to look for a copy.) Congratulations on what sounds like a truly impressive step forward in XML performance! Noah -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- Rob Cameron <cameron@c...> 02/25/2008 08:13 AM To: xml-dev@l... cc: (bcc: Noah Mendelsohn/Cambridge/IBM) Subject: XML parsing @ 100MB-1000MB/sec/GHz with Parallel Bit Streams I am pleased to announce the availability of parabix-0.40, a high-performance XML parsing engine prototype that can parse text-oriented XML document on commodity processors at over 200MB/sec per processor GHz and data-oriented XML documents at speeds approaching that. At this point, this includes correct parsing of correct documents and dispatch to markup action routines using an in-line API for XML (ilax). As the parabix stack is built out to incorporate validation and object creation, I am expecting overall performance above 100MB/sec/GHz. With linear speed-up on multicore processors and other improvements, 1000MB/sec/GHz is forseeable. By way of comparison, XML Screamer (Koustalas et al, WWW 2006) performs parsing, validation and business object creation on commodity processors at the rate of 23-46 MB/sec per processor GHz (MB/sec/GHz), a substantial increase over the cited rate of 2.5-6 MB/sec/GHz for traditional validating parsers. This is very good performance for traditional character-at-a-time parsing, taking advantage of a collection of techniques such as optimization across layers and schema-based customization. As a benchmark, 100 MB/sec/GHz is cited as the limit on throughput achievable for a simple character-at-a-time scanning loop. My research is investigating the development of very high-speed text processing based on a fundamentally new approach: using parallel bit streams to represent character data and the SIMD processor capabilities of commodity CPUs to process these bit streams. I have first applied these techniques to the problem of UTF-8 to UTF-16 transcoding, to achieve end-to-end speed-up of 3X to 25X compared with standard iconv and similar implementations. The open source implementation of u8u16 is available at http://u8u16.costar.sfu.ca/ and the results have just been presented to ACM PPoPP 2008 in Salt Lake City. Parabix (parallel bit streams for XML) is a research prototype that is nevertheless being designed to become the basis for a full XML processing stack. The working code repository is now available as an open source code base under OSL 3.0. http://parabix.costar.sfu.ca/ I am hoping to accelerate development of parabix technology through the open source model as well as continuing the academic research project with a team of graduate students who are coming up to speed. I have also created a spin-off company to oversee commercial development of the technology. However, in the context of discussion of XML performance issues and the next ten years of development of XML technology, I think that the work is sufficiently well advanced to support the following advice: Do not assume that XML processing performance is inherently limited by the nature of present-day character-at-a-time parsing technology. Intraregister and intrachip parallelism hold out a realistic promise of dramatic performance improvement on commodity processors. -- Robert D. Cameron, Ph.D. Professor of Computing Science, Simon Fraser University President and CTO, International Characters, Inc. _______________________________________________________________________ XML-DEV is a publicly archived, unmoderated list hosted by OASIS to support XML implementation and development. To minimize spam in the archives, you must subscribe before posting. [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ Or unsubscribe: xml-dev-unsubscribe@l... subscribe: xml-dev-subscribe@l... List archive: http://lists.xml.org/archives/xml-dev/ List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||






