Re: XML tools and big documents
Nigel Kerr <nigelk@u...> writes: > "what's the most immediate containing element of offset X in > file Y?" > > "traverse up the logical structure from offset X until a DIV > element with a HEAD is found, and return me the offsets of > that HEAD" > > Exact expression language is, uh, gee. These are the kinds of > questions we could ask with "some XML query language", but if i have a > gigabyte or so of variously-structured English text marked up this > way, i really don't want to have to parse the document entity just to > answer these kinds of simple questions. This is a weak specification > of what I'm trying to do, i realize. (this all largely because i am Our LT XML tool set and API were designed for precisely this sort of application (we regularly work with >1GB language SGML-encoded corpora such as the BNC). We get good performance because 1) Our parser is written in C, our search and retrieval tools use it directly via a stream-based API, only custom UI tends to get written in a scripting language which looks at whole trees; 2) We only produce tree fragments when we get to the interesting bits: our query processor is optimised to avoid building large amounts of tree unnecessarily; 3) For REALLY big datasets, we do produce and use offset-based indices. For more information, see http://www.ltg.ed.ac.uk/software/xml/. ht -- Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@c... URL: http://www.ltg.ed.ac.uk/~ht/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format