[Fwd: Re: storing XML files]
Soumitra Sengupta wrote: > This is a valid way of solving XML storage, indexing and retrieval problem if > you have: > > a. you have pretty well defined DTD or schema for all the incoming documents and > they are unlikely to change. You do not expect to add new DTD or schemas in the > future often. > b. the hierarchical relationship expressed in these documents are not very rich > or are not too nested. Agreed. However, I would caution that "not very rich" and "not too nested" are not well-defined terms. That is, what is acceptable performance in one application is not acceptable in another. Furthermore, native XML databases are not guaranteed to outperform XML-enabled relational databases in all cases. As a general rule: 1) Native XML databases should outperform XML-enabled relational databases on queries along the XML hierarchy, such as XPath queries. 2) It's anybody's guess which will perform better on queries not along the hierarchy, such as a query that inverts an XML document. Theoretically, it seems that if the queried fields are indexed in both types of database and similarly clever people implement the query and storage engines, performance should be comparable. However, I don't know enough about storage and query engines to know if this is true. 3) Native XML databases could encounter serious problems querying unindexed data. This is because they have to search many more nodes to find the data. That is, first they have to determine if a node is the correct node ("Are you a foo element?"), then they have to look at the data for the node ("Is your value 'bar'?"). Relational databases only have to look at a given set of column values. > If on the other hand you do not have control on the structure of the incoming > documents, they vary a lot and they are very rich, you should look at using a > native XML database. Agreed, with the caveat that "very rich" is an ill-defined term. > > There is a slight possibility that the customer moves to Oracle, so my > > solution shouldn't be database provider dependent. > > > > I recommend that you look at our Xfinity Server 2.0 > (http://www.b-bop.com/products_xfinity_server.htm) as it will allow you the > benefits of a native XML database but protect your investment in RDMS technology > like MS-SQL server or Oracle. It meets your requirement of "database provider" > independent. Note that Xfinity provides database independence because it is built on top of a relational database. This allows you to switch the underlying database. This is not true of all native XML databases, many of which have proprietary database engines. > > Is it smart to develop a generalized format into which all those data feeds > > will be transformed (I hope an XSLT is a good tool to do that)? Or, I should > > deal with them individually? > > Depends on whether the structure is pretty static, how many different structure > are you dealing with, are they likely to change a lot, are they very rich? Also > how much time and money do you want to spend on development and do you want to > protect your existing investment? Personally, I don't think this is a good idea. The types of data you mentioned (stock quotes, weather report, events information, daily news, horoscope, etc.) are all radically different from each other. It makes little sense to convert them to a common XML schema. > > Can I apply XSLT on my XML data feeds and directly produce SQL statements? > > I have not seen anything that does this. Anyone with an answer to this? I haven't either. However, there are a number of middleware products that will store the data for you. (These create and execute the SQL statements internally.) See the middleware section of the product list mentioned by Chris. > > Has someone already developed an XML2SQL transformation tool? Yes, although they actually transfer the data instead of generating SQL statements. Again, see the middleware section of the product list, as well as the XML-enabled databases section. > > Just a flat > > table solution or I can also capture relationship information? Many of the middleware solutions, as well as all of the XML-enabled relational databases, use an object-relational mapping that captures relationship information. > We have solved this problem in Xfinity. All the rich relationships are captured > and stored. I think it is important to understand a fundamental difference between what a native XML database built on top of a relational database (like Xfinity) and an XML-enabled relational database (like SQL Server). For example, consider the following document: <SalesOrder SONumber="12345"> <CustomerNumber="543" /> <OrderDate>981215</OrderDate> <Item ItemNumber="1"> <PartNumber="123" /> <Quantity>10</Quantity> </Item> <Item ItemNumber="2"> <PartNumber="456" /> <Quantity>5</Quantity> </Item> </SalesOrder> In an XML-enabled relational database, this will be mapped to two tables: Orders and Items. In a native XML database built on a relational database, this will be mapped to tables such as Elements, Attributes, and PCDATA. (In fact, the database structure is probably more complex for performance reasons, but this gives you the idea.) Each structure is appropriate for different needs. The main advantage of an XML-enabled relational database is that the data can be used by legacy (non-XML) applications. The main drawback is that you have to map the data at design time. This doesn't matter if you're working with a fixed set of schemas (as was mentioned above), but is clearly a problem if you want to store arbitrary XML. The main advantages of native XML databases are that you can store arbitrary XML documents at runtime and that they handle the richness of XML documents well. (Relational databases do not handle semi-structured data well.) The main drawback is that non-XML applications cannot access the data. This obviously doesn't matter if you don't have any such applications. > > Is it the right thing to save those XMLs in to a relational database or > > shall I explore the native XML databases? > > Depends on incoming XML and the needs of your application. Agreed. > > Is there an existing DTD for SQL or that makes no sense to you? These exist, but I don't think anybody actually uses them. This is usually handled internally by the products that transfer data between XML documents and relational databases. > > My system should be flexible enough to accept new kinds of data feeds ... > > > > This I think is the key requirement that you have. Unless there are other > requirements that I do not understand or know of, you really should look at > native XML databases. Agreed. This is not something that XML-enabled relational databases do well, so it's worth your time to look at native XML databases. -- Ron
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format