[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

XML parsing @ 100MB-1000MB/sec/GHz with Parallel Bit Streams

  • From: Rob Cameron <cameron@c...>
  • To: xml-dev@l...
  • Date: Mon, 25 Feb 2008 05:13:17 -0800

XML parsing @ 100MB-1000MB/sec/GHz with Parallel Bit Streams
I am pleased to announce the availability of parabix-0.40, a high-performance
XML parsing engine prototype that can parse text-oriented XML document
on commodity processors at over 200MB/sec per processor GHz and 
data-oriented XML documents at speeds approaching that.    At this point, 
this includes correct parsing of correct documents and dispatch to markup 
action routines using an in-line API for XML (ilax).    As the parabix stack 
is built out to incorporate validation and object creation, I am expecting
overall performance above 100MB/sec/GHz.  With linear speed-up on
multicore processors and other improvements, 1000MB/sec/GHz is 
forseeable.

By way of comparison, XML Screamer (Koustalas et al, WWW 2006) performs
parsing, validation and business object creation on commodity processors at
the rate of 23-46 MB/sec per processor GHz (MB/sec/GHz), a substantial
increase over the cited rate of 2.5-6 MB/sec/GHz for traditional validating
parsers.

This is very good performance for traditional character-at-a-time parsing,
taking advantage of a collection of techniques such as optimization
across layers and schema-based customization.  As a benchmark, 
100 MB/sec/GHz is cited as the limit on throughput achievable for a
simple character-at-a-time scanning loop.

My research is investigating the development of very high-speed text
processing based on a fundamentally new approach:  using parallel bit
streams to represent character data and the SIMD processor capabilities
of commodity CPUs to process these bit streams.

I have first applied these techniques to the problem of UTF-8 to
UTF-16 transcoding, to achieve end-to-end speed-up of 3X to 25X
compared with standard iconv and similar implementations.   The 
open source implementation of u8u16 is available at
http://u8u16.costar.sfu.ca/ and the results have just been
presented to ACM PPoPP 2008 in Salt Lake City.

Parabix (parallel bit streams for XML) is a research prototype that is
nevertheless being designed to become the basis for a full XML
processing stack.  The working code repository is now available
as an open source code base under OSL 3.0.   
http://parabix.costar.sfu.ca/

I am hoping to accelerate development of parabix technology through the
open source model as well as continuing the academic research project
with a team of graduate students who are coming up to speed.    I have
also created a spin-off company to oversee commercial development
of the technology.

However, in the context of discussion of XML performance issues and
the next ten years of development of XML technology, I think that
the work is sufficiently well advanced to support the following advice:
Do not assume that XML processing performance is inherently limited
by the nature of present-day character-at-a-time parsing technology.
Intraregister and intrachip parallelism hold out a realistic promise of
dramatic performance improvement on commodity processors.
-- 
Robert D. Cameron, Ph.D.
Professor of Computing Science, Simon Fraser University
President and CTO, International Characters, Inc.



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.