[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Shredding XML

  • From: Johannes Lichtenberger <Johannes.Lichtenberger@uni-konstanz.de>
  • To: Fraser Goffin <goffinf@googlemail.com>
  • Date: Sun, 01 Nov 2009 13:51:37 +0100

Re:  Shredding XML
On Sat, 2009-10-31 at 01:05 +0000, Fraser Goffin wrote:
> Thanks for the great comments thus far from every one.
> 
> Several people have mentioned using BLOB or CLOB and indeed this is
> something we have done in the recent past. However, one of the key
> issues is that at least some the applications that will access the
> data are either not XML capable and/or the programmers using them are
> not really that familiar. Whilst its possible to process XML data
> natively in Cobol, most of the time this is not the approach thats
> taken, and resource constraints and project deadlines often mitigate
> towards existing skills, technologies and practices.
> 
> So I'm really interested in experience of shredding moderately complex
> XML content models into relational tables (for example structures that
> might produce 30-50 even possibly more tables when decomposed). And
> also some arguments for and against that approach (I would like to be
> able to make a compelling case for moving towards treating XML as a
> first class type system rather than one which just providing a format
> for data exchange).
> 
> One of the suggestions from one of our solution designers was to
> 'flatten' the XML structure and represent relationships using
> keys/ids, that is, make the XML more like the database. Personally I
> like the contextual relationships implicit in the hierarchical content
> model and am not really keen to navigate around the document using ID
> values as opposed to simply walking the tree ... but maybe others
> people's experience could provide some use cases where that approach
> has merit ?.

You definately should have a look at the XPath Accelerator Scheme
originally developed from Thorsten Grust (which can even be enhanced),
the Staircase Join and various tree based enhancements to rewrite XQuery
queries into (very) efficient SQL queries.

> I am mainly interested in the process of LOADING XML data to a
> database rather than extracting (at least for the purposes of this
> discussion). So another key issue (excuse the pun) is that I will be
> processing the XML data and at various points contructing SQL INSERT
> statements including gathering together all of the [primary] key
> values that identify each entity and their [foreign] key
> relationship(s). Not all the data to support those relationships is
> inherent in the source XML data, so I also need to think about
> generating key values either from the database or as part of the XML
> processing. Of course use of stored procedures are another aspect in
> terms of positioning of the business/transformation logic.

You can parse XML (possibly very large XML files) with for instance SAX
and generate the appropriate relational encoding. With the XPath
Accelerator scheme you preserve the tree relationship, document order
and so on _and_ it has an inherit knowledge of each XPath axis, so one
can rewrite XQuery statements in SQL (with tree knowledge you can also
avoid duplicates efficiently, skip certain nodes etc.pp and with the
staircase join you have a very efficient JOIN operator).

HTH,
Johannes



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.