|
[XML-DEV Mailing List Archive Home]
[By Thread]
[By Date]
[Recent Entries]
[Reply To This Message]
RE: storing XML files
- From: Dave Simmonds <daves@t...>
- To: xml-dev <xml-dev@l...>
- Date: Wed, 10 Oct 2001 09:56:38 -0600
Title:
I honestly don't know if someone has come up with this before,
so please enlighten me as appropriate. I've been given the task of mapping
XML into an Access database format. The main requirement is that the
database structure must not change significantly as new previously unknown types
of XML documents are added. This requirement in place so that we can
create a relatively simple client application that can count on a certain db
structure.
In solving this problem I've come up with a base of
approximantly 5 tables that are needed. In order to accomplish this task -
each XML document must be able to validate against a Schema (or DTD) and the db
must support crosstab/pivot type queries.
Here are the five
tables.

The basic concept
is the create a unique id for each possible piece of data that can be stored
relative to any given Schema. Since an XPath can provide this mapping we
simply parse the Schema to get our XPath's and map it to our id. Each
XPath can have any set of constraints specified and a set of possible values
associated (enumeration) with it. With this setup, a document simply
becomes a simple 1 to many relationship - 1 document has many XPath's - each
XPath with a correspnding value.
With this structure, when a new type of document comes along all
we need to do is parse it's Schema and then add the document. The
underlying db structure won't need to change.
The example above doesn't show all the possible constraints that
can be put on any given XPath, but could easily be extended to do so.
As far as the SQL is concerned, if we want to get a recordset of
all objects of 1 particular type (Schema) we perform a crosstab/pivot query that
will give us columns of XPaths filled will the corresponding values. In
other words, 1 row in the recordset becomes 1 document.
With this method, I believe it's possible to map any well formed
xml with a DTD/Schema to an RDBMS.
Dave
-----Original Message----- From: Ronald
Bourret [mailto:rpbourret@r...] Sent:
Wednesday, October 10, 2001 1:13 AM To: Albena Georgieva; xml-dev Subject:
Re: storing XML files
Albena Georgieva wrote: >
Ronald: What do you mean by "semi-structured"_ness of the
data?
Structured data is very rigid. That is, all records have the
same fields. An example of structured data is a telephone book -- every
entry has a name, an address, and a telephone number.
Semi-structured
data is data that has some structure, but is not rigidly structured. An
example of semi-structured data is a health record. For example, one patient
might have a list of vaccinations, another might have height and weight,
another might have a list of operations they have had. Other examples of
semi-structured data are legal documents and geneological records.
A
common example in industry is data that has been retrieved from
many different sources. For example, if you ask for all the data
about customer X, you might get a sales history, emails, stock profile, and
so on. However, the set of data will differ greatly from customer
to customer.
Semi-structured data is difficult to store in a
relational database because it means you either have many different tables
(which means many joins and slow retrieval time) or a single table with many
null columns. Semi-structured data is very easy to store as XML and is a good
fit for a native XML database.
> <!--
EXAMPLE --> > > <?xml version="1.0" encoding="ISO-8859-1"
?> > <!DOCTYPE nitf SYSTEM "nitf-adjusted-13c.dtd"> >
<nitf> > <head> > <meta name="onlinefolder"
content="Entertainment"></meta> > <docdata> >
<date.issue norm="20010302 150629+0100"></date.issue> >
<key-list> >
<keyword
key="Maxima"></keyword> >
<keyword
key="Princess"></keyword> >
</key-list> > </docdata> > </head> >
<body> > <body.head> > <hedline><hl1>Maxima
speaks perfect dutch</hl1></hedline> >
</body.head> > <body.content> > <p>Princess Maxima
presented the Princess Collection 2000 in an estate with > a very royal
ambiance. The dazzling collection is also presented in the new > catalogue
full of show, glitter and glamour.</p> >
</body.content> > </body> >
</nitf> > > <!-- /EXAMPLE
--> > > All I want to do with them for now, is to save them in a
RDBMS. The rest of > the applications (ASPs or servlets) will access that
data through ODBC or > JDBC. ... > If the applications access the
datafeeds through ODBC or JDBC and they don't > ask for XML format, ( no
need for XML retrieval ) I see no reason for > introducing a XML native
database at this point, but please correct me I am > wrong
...
These two requirements -- storing NITF in a relational database and
not returning any XML -- conflict with each other. If you don't want
to return XML, that means no XML can be stored in the database. But if
you map NITF to a relational database using the most widely accepted
mapping that doesn't store XML in the database (an object relational
mapping), you will end up with an almost useless set of tables.
The
problem is that NITF uses mixed content and mixed content doesn't map well
with an object-relational mapping. (I won't go into the details here. If you
want to read more about this, see sections 3.3 and 3.4 of http://www.xml.com/pub/a/2001/05/09/dtdtodbs.html.)
Furthermore, as far as I know, the only product that supports
mixed content in an object-relational mapping is mine (XML-DBMS) and
I recommend that people don't use XML-DBMS with mixed content because
it is so inefficient. (XML-DBMS started out as a research project and
I supported mixed content for completeness. Were I to do it over again,
I probably wouldn't support it.)
Your choices therefore are:
1)
Store your documents in a native XML database.
2) Store your documents as
BLOBs in a relational database and index them as Soumitra suggests. (For an
example of this technique, see the discussion of side tables in section 6.2
of http://www.rpbourret.com/xml/XMLAndDatabases.htm.)
3) Store the documents in a relational database with
object-relational middleware. However, instead of storing mixed content in
multiple tables, store it as XML in a single column. For example, you would
store the content of the <body.content> element in a single column
as:
<p>Princess Maxima presented the Princess
Collection 2000 in an estate with a very royal ambiance. The
dazzling collection is also presented in the new catalogue
full of show, glitter and glamour.</p>
Notice the
<p> elements. Not all middleware products support this technique, but
some do. Another problem with this choice is that it is not clear if
non-mixed-content parts of the NITF DTD can even be mapped to a useful set of
tables. With some transformations, it might be possible, but I'm not
sure.
In any case, your applications are almost certainly going to have
to deal with some XML. Otherwise, there doesn't seem to be any way to
deal with the mixed content.
> I just need a good way to transform
a XML data feeds into relational > database model. That is why, I asked
for something like DTD2SQL or XML2SQL > tool. So, I imagined applying XSLT
transformations to every different > datafeed to transform it into some
DTD for SQL and then just run that SQL.
This is possible. That is, you
could write an XSLT transformation that transformed your documents into a
series of tables. For example, transform a document with the
form:
<table1.row>
<table1.column1>...</table1.column1>
<table1.column2>...</table1.column2>
<table2.row>
<table2.column1>...</table2.column1>
<table2.column2>...</table2.column2>
<table2.column3>...</table2.column3>
</table2.row> </table1.row>
to the
form:
<table1.row>
<table1.column1>...</table1.column1>
<table1.column2>...</table1.column2>
</table1.row> <table2.row
table1.key="...">
<table2.column1>...</table2.column1>
<table2.column2>...</table2.column2>
<table2.column3>...</table2.column3>
</table2.row>
It would then be easy to insert these rows into
tables. (Note that there is no need for a standard DTD here. The
correspondence of elements to tables and columns is inherent in the
structure. To be truly generic, all that would be needed is a simple mapping
file that mapped element and attribute names to table and column
names.)
The data transfer code is easy to write, although the
stylesheets probably aren't. (I'm actually surprised that nobody has written
a product that takes an XML-to-DBMS map and generates the
above stylesheets. It wouldn't be the most efficient way to transfer data
due to the XSLT transform, but it has the advantage of simplicity.)
In
any case, the code and the stylesheets aren't the problem. The problem is the
one mentioned above -- that the NITF DTD simply doesn't map well to
relational schema due to its mixed content.
--
Ron
----------------------------------------------------------------- The
xml-dev list is sponsored by XML.org <http://www.xml.org>,
an initiative of OASIS <http://www.oasis-open.org>
The list archives are at http://lists.xml.org/archives/xml-dev/
To
subscribe or unsubscribe from this elist use the subscription manager:
<http://lists.xml.org/ob/adm.pl>
Outlook.bmp
|
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format
| RSS 2.0 |
 |
| Atom 0.3 |
 |
| |
Stylus Studio has published XML-DEV in RSS and ATOM formats,
enabling users to easily subcribe to the list from their preferred news reader application.
|
Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website.
they were not included by the author in the initial post. To view the content without the Sponsor Links please
click here.
|
|