[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: What is a good database for very large collections?

  • From: Marcelo Cantos <marcelo@m...>
  • To: xml-dev-digest@i...
  • Date: Thu, 4 Feb 1999 14:49:09 +1100

good database
On Mon, Feb 01, 1999 at 12:33:53PM -0500, Borden, Jonathan wrote:
> >
> > Can I try to shift it back to a vital question asked earlier, but not
> > answered?
> >
> > What is a good database for XML?


SIM (http://www.simdb.com/sim_2.1/


> > The criteria are:
> >     * over 20, 000, 000 document fragments, each less than 256
> > characters, each with some flat metadata, able to be incrementally
> > reloaded onto the live system
> >     * about simultaneous 30 users accessing about 10 fragments a minute
> > each, grouped together (along with other dynamic data) and transformed,
> > with a high need for immediate response

We can load about 200 MB per hour while live (actually I think we can
load 400-500 MB/hr but we claim 200 MB to add a safety factor).  We
handle small documents quite well through DTD caching techniques (we
also plan to include expat in the near future for unvalidated XML. We
do currently support unvalidated XML, but through SP, which is not as
fast as we'd like).  Queries are fast (we queried "to be or not to be"
across 55 GB in 74 seconds on a 2x336 MHz UltraSPARC with 1 GB
RAM--note that this was a word position query using several stop
words).

> How are the fragments selected? By query? If you can easily
> represent the 20M fragments in tabular form, and if you can easily
> represent the queries in SQL then a relational db is the way to go.
> this is not a particularly large, nor high-volume application for
> RDBMS.

And if you can't represent them in tabular form, try SIM.

> Ought you store the 20m fragments each in its own file ... probably
> not (a big waste). Ought you employ an ODBMS? not unless SQL
> wouldn't work well (you could always load it into say Oracle/SQL
> Server/DB2 etc vs. ODI/Poet etc and test it out). My expectation
> would be that if you need to run queries, the RDB will win.

For content queries (e.g. summary CONTAINS "stock option*") SIM will
easily outperform an RDBMS.  Customers have chosen our product above
RDBMS's for this very reason.

> >     * constant data-mining tools using various adhoc AI and linguitic
> > retrieval software augmenting the metadata in the background.

We support stored queries and scheduled queries with filters to exclude
previously returned records.  I'm not sure if this meets the above
requirement.


To say there are no scalable solutions (as someone did recently on
xml-dev) is simply false.  There may be no scalable solutions that do
everything you want--and I'm certainly not touting SIM as the be-all
and end-all (we have yet to support XQL, full path indexing,
transactions, etc. all are pending with varying levels of
priority)--but there are products available right now that scale and
solve people's problems.

SIM has been used in law (http://www.thelaw.tas.gov.au is the world's
first legislation to officially go online),  taxation
(http://www.ato.gov.au/general/advanced/adv.htm), other government
(libraries, NSA--no URL, sorry :-), aviation (Boeing), etc.  Moreover,
our customers don't go away dissatisfied.  We are quite proud of the
fact that every SIM site is a reference site.  We are also pleased
that in some instances, project managers have been promoted as a
result of using SIM!


Cheers,
Marcelo Cantos
SIM developer

-- 
http://www.simdb.com/~marcelo/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.