Re: No XML Binaries? Buy Hardware
On Fri, 23 Feb 2007 noah_mendelsohn@u... wrote: > Elliotte Harold writes: > > > I don't think we've hit the limits of parser performance yet, > > I think that asking "what are the theoretical limits" is very important, > and it's not a question I've seen discussed often enough. From our paper > : > > > No parser can process input faster than its supporting hardware > > accesses data, but the additional cost of parsing and > > validation should be minimized. On a 1 GHz Pentium processor a > > simple character-scanning loop runs at about 100 Mbytes/second, > > which is 10 cycles/byte. > > > [..] > > > On the tests reported in this paper, using the business object > > API typical of Web Services applications, XML Screamer parses > > and schema-validates XML at between 23 and 46 Mbytes/sec/GHz; > > XML Screamer can thus process XML at speeds of roughly 100â200 > > Mbytes/sec on the 4 GHz processors now becoming available. > > > [...] > > > Using its business object APIs, XML Screamer scans, parses, > > validates and deserializes at between 22% and 44% of the tested > > processor's raw character scanning speed. Except insofar as > > ways can be found to use such processors more efficiently, e.g. > > by exploiting hardware string test instructions or on chip SIMD > > accelerators, gains from further tuning or alternative > > approaches are likely to be modest. XML Screamer's performance > > is probably not far from the maximum achievable. > > In short, we observed that to check well formedness, a parser must at > least touch each input character. You can benchmark various > processor/memory combinations using their most optimized forms of > character and string comparison and find out how fast they can inspect > each byte of an input buffer, doing the sorts of character comparisons > necessary for well formedness checking. There may be ways to do better > than we did on particular processors, but I think it's interesting that > one can set a pretty good bound on how fast XML processing can go. If we relax the definition of XML well-formedness checking, we can have better bound on the performance. In an extreme case, if we regard the XML document as equivlent as a sequence of matched parenthesises and if our goal is to find the scope of the pair of tags and its relationship with other pairs in the doucment, lots of work can be saved. Our test shows that this kind of "pre-parsing" is 6-7x time faster than the SAX api parsing provided by libxml2 . The result of "pre-parsing" can be used to guide the parallel xml processing or lazy xml processing or other forms optimizations. So I think the bound on XML processing performance depends on what the XML processing needs in minimun and we really have more space to push the bound  http://www.cs.indiana.edu/~welu/pxp_grid06.pdf Wei Lu Indiana University > Furthermore, I think our work shows that it is possible to get not to far > from that bound, for some definition of "not to far" :-). > > Noah > >  http://www2006.org/programme/item.php?id=5011 > > -------------------------------------- > Noah Mendelsohn > IBM Corporation > One Rogers Street > Cambridge, MA 02142 > 1-617-693-4036 > -------------------------------------- > > > > > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format