RE: Indexing solution for native XML database
I would recommend that you take a look at some of the VLDB and SIGMOD papers that we have published on how we implement XML and XQuery in SQL Server. You can find the links to our papers on my weblog at http://sqljunkies.com/weblog/mrys. IBM and Oracle also have papers at recent VLDBs and SIGMODs describing their approaches. While all of us drive it deeper into the engine than what is possible for you, you may find some ideas of tackling your performance issues. Best regards Michael > -----Original Message----- > From: Timo Hildén [mailto:castaway@d...] > Sent: Monday, November 28, 2005 12:16 AM > To: xml-dev@l... > Subject: Indexing solution for native XML database > > Hello all, > > I'm searching for an indexing solution for my native XML database project, > which I'm writing as a learning project. > > I use C++ as development language and a relational database as backend. > I'm using a data model based on the XQuery 1.0 / XPath 2.0 Data Model > (W3C), and I have tables in the relational database corresponding to the > seven node kinds mentioned in that specification. I've implemented simple > XPath support and everything seems to work fine so far, but performance is > a gruesome problem. The more data I store into the database, the slower > expression processing gets. No surprise there. > > The expressions are processed using the schema and instance data in the > underlying relational database in compilation and execution phases, > respectively. I think I've indexed the columns in the relational database > as optimally as possible, so no further speedups can be achieved at that > level. > > So, some sort of indexing scheme is clearly required at the data model > level. Frankly, I'm quite unfamiliar with indexing when it comes to native > XML databases, especially cases that use a relational database as backend, > as mine does. I'd like to have some kind of automatic structural indexing > at least. > > I'll provide some background information about my project, simplified for > clarity. Let's say I have the following schema: > > <xs:element name="Patrol" maxOccurs="unbounded"> > <xs:complexType> > <xs:sequence> > <xs:element name="Officer" maxOccurs="2"> > <xs:complexType> > <xs:sequence> > <xs:element name="FullName" type="xs:string" /> > </xs:sequence> > </xs:complexType> > </xs:element> > <xs:element name="Arrests" type="xs:integer" /> > </xs:sequence> > </xs:complexType> > </xs:element> > > I also have several instance documents based upon it: > > <Patrol> > <Officer> > <FullName>Carey Mahoney</FullName> > </Officer> > <Officer> > <FullName>Larvell Jones</FullName> > </Officer> > <Arrests>17</Arrests> > </Patrol> > > ... etc ... > > Here is an excerpt of the SchemaNodeElement table that represents schema > information of the elements in the data model: > > Table: SchemaNodeElement > +----+----------+----------+----------+ > | Id | ParentId | Name | DataType | > +----+----------+----------+----------+ > | 1 | NULL | Patrol | 0 | > | 2 | 1 | Officer | 0 | > | 3 | 2 | FullName | 1 | > | 4 | 2 | Arrests | 2 | > > ... where: > > DataType 0 = complex type > 1 = string > 2 = integer > > Here is an excerpt the element node table, representing instance data: > > Table: NodeElement > +----+----------+----------+--------------+ > | Id | ParentId | Name | TypedValueId | > +----+----------+----------+--------------+ > | 1 | NULL | Patrol | NULL | > | 2 | 1 | Officer | NULL | > | 3 | 2 | FullName | 1 | > | 4 | 2 | FullName | 2 | > | 5 | 1 | Arrests | 1 | > ... > > Typed-value for complex types is NULL. > > I also have separate tables for each datatype, as follows: > > Table: TypedValueInt > +----+-------+ > | Id | Value | > +----+-------+ > | 1 | 17 | > ... > > Table: TypedValueText > +----+---------------+ > | Id | Value | > +----+---------------+ > | 1 | Carey Mahoney | > | 2 | Larvell Jones | > ... > > XPath expression processing is divided into two phases: compilation and > execution. As an example, consider the following expression: > > /Patrol[Arrests>10]/Officer > > The compiler does the schema validation by executing SQL statements > against the schema data in the underlying database, testing that the > structure is valid and retrieving the datatypes for values needed in > comparisons (in this example, datatype for the Arrests element is > retrieved). In addition, the compilation base generates the SQL for the > execution phase. > > After the compilation succeeds, the expression can be executed. Suppose > something like the following is generated by the compiler (this may or may > not be correct, but I know you get the idea): > > SELECT Id > FROM NodeElement > WHERE Name = 'Officer' > AND ParentId IN ( > SELECT Id > FROM NodeElement > WHERE Name = 'Patrol' > AND Id IN ( > SELECT ParentId > FROM NodeElement e, TypedValueInt i > WHERE e.Name = 'Arrests' > AND e.TypedValueId = i.Id > AND i.Value > 10 > ) > ) > > As you can see, the expressions are processed against the schema data > (compilation phase) or instance data (execution phase) in the underlying > relational database. > > So far so good. But then, as the number of document instances increases in > the database, the performance degrades linearly, if not worse. This is > where I guess some kind of indexing solution is essential. > > What kind of efficient indexing solutions are there, considering this kind > of native XML database? Structural indexing would be the most important > issue I suppose, followed by value indexing and perhaps also full-text > indexing. > > Any help is appreciated. > > -- > Regards, > > Timo > > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://www.oasis-open.org/mlmanage/index.php>
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format