[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Storing Lots of Fiddly Bits (was Re: What is XML for?)

  • From: "W. Eliot Kimber" <eliot@d...>
  • To: "XML Developers' List" <xml-dev@i...>
  • Date: Sat, 30 Jan 1999 17:33:28 -0600

cspn bits
At 12:55 PM 1/30/99 -0500, Borden, Jonathan wrote:

>	In general, object databases have been designed to efficiently store lots
>of c++ (or java) objects which contain embedded pointers (or references) and
>they provide a mechanism to navigate the database using the semantics of a
>pointer dereference. They are not designed to *efficiently* perform complex
>queries, especially those that SQL databases excell at.

If this is the definition of object database, then I don't think it
qualifies as a "database" at all--it's just persistent object storage,
which is useful, but not very interesting.  At least my layman's idea of a
"database" is that it is both general and supports queries.

Of course, this has always been one of my problems with object-oriented
programing in general: it tends to cause people to conflate the data with
the processing to the degree that the objects end up becoming primary,
rather than things that serve the data.  Persistent objects are useful as
an optimization technique but they should never be a substitute for
standards-based data repositories.

As a Certified SGML Paranoid Nutcase (CSPN) I distrust all software
implicitly and therefore always prefer solutions in which the data,
represented using SGML or XML, is the primary data store, with any other
representations being merely transient reflections of that data for
purposes of optimization and that sometimes you are forced to trust your
software not to screw up your data too badly. Of course I realize that this
extreme view can't work for a some use scenarios, but it turns out to work
really well for a lot of them, especially high-volume *publishing*
scenarios, where the input to the publishing system is the SGML or XML--the
cost of reserializing documents stored as objects at production time is
orders of magnitude higher than the cost of objectizing them at indexing or
editing time, largely because the throughput requirements are different for
these different processes.  In other words, if the SGML data wasn't the
primary format, it would be impossible to meet the production throughput
requirements.  For one particular customer, even the cost of not having the
files directly on the file system is too high, so they have to go around
behind the back of their storage manager (which provides access control and
file-level versioning).

Or said another way: optimizing for one part of the process usually, if not
always, deoptimizes for another part. Not news, but it bears repeating once
in a while.

As an example of the cost of deserialization, we have a client with about
80 Meg of SGML data organized into about 15000 small documents (most
documents are less than 2K in length).  On a 400mhz Pentium II with 128Meg
of memory (running Windows NT) and gigs of free disk space, it takes 21
hours to load this data into the repository (one of the leading SGML
element manager databases, implemented on top of a leading object database)
and 8-10 hours to export it.  And, unless we're doing something wrong, the
import process does not include indexing of the data, only objectizing it.
This seems a little extreme to me. It may be that this product is
particularly poorly implemented or that we have failed to perform some
essential tuning action, but still, 21 hours?  I hope that this annecdotal
evidence is not indicative of other, similar systems, but it's not very
encouraging.

Cheers,

E.
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 75202.  214.953.0004
www.isogen.com
</Address>

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.