Re: What is a good database for very large collections?
On Mon, Feb 01, 1999 at 12:33:53PM -0500, Borden, Jonathan wrote: > > > > Can I try to shift it back to a vital question asked earlier, but not > > answered? > > > > What is a good database for XML? SIM (http://www.simdb.com/sim_2.1/ > > The criteria are: > > * over 20, 000, 000 document fragments, each less than 256 > > characters, each with some flat metadata, able to be incrementally > > reloaded onto the live system > > * about simultaneous 30 users accessing about 10 fragments a minute > > each, grouped together (along with other dynamic data) and transformed, > > with a high need for immediate response We can load about 200 MB per hour while live (actually I think we can load 400-500 MB/hr but we claim 200 MB to add a safety factor). We handle small documents quite well through DTD caching techniques (we also plan to include expat in the near future for unvalidated XML. We do currently support unvalidated XML, but through SP, which is not as fast as we'd like). Queries are fast (we queried "to be or not to be" across 55 GB in 74 seconds on a 2x336 MHz UltraSPARC with 1 GB RAM--note that this was a word position query using several stop words). > How are the fragments selected? By query? If you can easily > represent the 20M fragments in tabular form, and if you can easily > represent the queries in SQL then a relational db is the way to go. > this is not a particularly large, nor high-volume application for > RDBMS. And if you can't represent them in tabular form, try SIM. > Ought you store the 20m fragments each in its own file ... probably > not (a big waste). Ought you employ an ODBMS? not unless SQL > wouldn't work well (you could always load it into say Oracle/SQL > Server/DB2 etc vs. ODI/Poet etc and test it out). My expectation > would be that if you need to run queries, the RDB will win. For content queries (e.g. summary CONTAINS "stock option*") SIM will easily outperform an RDBMS. Customers have chosen our product above RDBMS's for this very reason. > > * constant data-mining tools using various adhoc AI and linguitic > > retrieval software augmenting the metadata in the background. We support stored queries and scheduled queries with filters to exclude previously returned records. I'm not sure if this meets the above requirement. To say there are no scalable solutions (as someone did recently on xml-dev) is simply false. There may be no scalable solutions that do everything you want--and I'm certainly not touting SIM as the be-all and end-all (we have yet to support XQL, full path indexing, transactions, etc. all are pending with varying levels of priority)--but there are products available right now that scale and solve people's problems. SIM has been used in law (http://www.thelaw.tas.gov.au is the world's first legislation to officially go online), taxation (http://www.ato.gov.au/general/advanced/adv.htm), other government (libraries, NSA--no URL, sorry :-), aviation (Boeing), etc. Moreover, our customers don't go away dissatisfied. We are quite proud of the fact that every SIM site is a reference site. We are also pleased that in some instances, project managers have been promoted as a result of using SIM! Cheers, Marcelo Cantos SIM developer -- http://www.simdb.com/~marcelo/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format