[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Exploiting multi-core CPUs during XML parsing
On Sat, Apr 01, 2006 at 11:00:24AM +0100, Andrew S. Townley wrote: > On Sat, 2006-04-01 at 10:28, Elliotte Harold wrote: > > What about memory mapped files? If you can treat the file as an array, > > it's just as easy to move backwards through the array as forwards. > > I thought about that (specifically the use of the facilities in the nio > package), but I figured that the original request was targeted at > environments other than Java. Right now, Java's direct support for > multi-core CPUs is a little lacking (ref the parser thread from last > week or so). If it was Java, I agree, you could use the approach you > suggest. Actually memory mapped files are supported in many (most?) modern operating systems, including Linux, *BSD, Microsoft Windows, MacOS, OS X, Solaris, RSX11 [1], etc. They are used at the C (or assmbly or C++) level. Writing an efficient XML parser that's as fast as possible on a given platform generally requires platform-specific techniques, because you need to know things like file system throughput compared with CPU speed. I've used systems where the network was faster than the local hard drive, too. But one could target a wide range of systems and still get something faster than most of today's parsers. For example, you could have a namespace manager thread, a read-ahead thread (for memory mapped files with mmap this involves accessing a byte or word in the next block), and a main worker thread. Reading files backwards is actually reasonably efficient on most Unix-like systems, by the way -- they have had a block-level file system cache for the better part of 30 years. Multiple cooperative threads reading forwards is probably easier to write, and since a single page fault is likely to last far longer than the time to parse a block (e.g. 512 bytes or 4K, depending on the system) of XML, readahead is more effective. Some systems will do readahead for you automatically when you access a file sequeentially. > The information in this email is confidential and may be legally > privileged. Access to this email by anyone other than the intended > addressee is unauthorized. If you are not the intended recipient > of this message, any review, disclosure, copying, distribution, > retention, or any action taken or omitted to be taken in reliance on > it is prohibited and may be unlawful. If you are not the intended > recipient, please reply to or forward a copy of this message to the > sender and delete the message, any attachments, and any copies thereof > from your system. I usually don't reply to personal messages containing these disclaimers, since I have in fact no way to know if I am the intended recipient, or if the sender typed my name but was really thinking of someone else. But on a public list, either this message should be deleted from the archives or the terms are meaningless. I tend towards the latter. If you think it might contain confidential information, don't post it to a public list. Liam -- Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/ http://www.holoweb.net/~liam/
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|