Re: Searching XML
Phil Ruelle wrote: > Each document will be an order containing a list of items. The most > frequent type of search would be to select orders that: > a) contain item X > b) contain item X and item Y > c) contain item X but do not contain item Y Given that searches are this simple (as I had suspected), I would recommend bypassing XML parsing altogether and just doing plain string searches. You will have a huge gain in speed, space, and simplicity. As I have posted here before, XML is a document format, not a processing model. > I am intrigued by your question about search complexity - I hadn't > really considered its effect. Could you point me to some references > that explain what the options are and how they alter according to > complexity or could you possibly expand a little on this yourself? If you want searches that involve context ("Find all foo's with content bar that are inside baz elements") then you pretty much need an XPath implementation. But if you don't care about context, the aforementioned string search is very reasonable. We process a lot of XML documents here by comparing them to about 100 files containing about 200-5000 search terms each. The simplest approach is just the string search, and it saves us from having to parse a single one of those documents. >>Also, what operating environment are you using? > > I'm using a desktop with Windows 98 and NT on it. Also I intend to > program in Java if only due to the vast amount of XML code > available. Grab yourself an implementation of "fgrep" for your platform, and don't program anything at all! -- There is / one art || John Cowan <jcowan@r...> no more / no less || http://www.reutershealth.com to do / all things || http://www.ccil.org/~cowan with art- / lessness \\ -- Piet Hein
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format