|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Indexing solution for native XML database
Hello all, I'm searching for an indexing solution for my native XML database project, which I'm writing as a learning project. I use C++ as development language and a relational database as backend. I'm using a data model based on the XQuery 1.0 / XPath 2.0 Data Model (W3C), and I have tables in the relational database corresponding to the seven node kinds mentioned in that specification. I've implemented simple XPath support and everything seems to work fine so far, but performance is a gruesome problem. The more data I store into the database, the slower expression processing gets. No surprise there. The expressions are processed using the schema and instance data in the underlying relational database in compilation and execution phases, respectively. I think I've indexed the columns in the relational database as optimally as possible, so no further speedups can be achieved at that level. So, some sort of indexing scheme is clearly required at the data model level. Frankly, I'm quite unfamiliar with indexing when it comes to native XML databases, especially cases that use a relational database as backend, as mine does. I'd like to have some kind of automatic structural indexing at least. I'll provide some background information about my project, simplified for clarity. Let's say I have the following schema: <xs:element name="Patrol" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="Officer" maxOccurs="2"> <xs:complexType> <xs:sequence> <xs:element name="FullName" type="xs:string" /> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Arrests" type="xs:integer" /> </xs:sequence> </xs:complexType> </xs:element> I also have several instance documents based upon it: <Patrol> <Officer> <FullName>Carey Mahoney</FullName> </Officer> <Officer> <FullName>Larvell Jones</FullName> </Officer> <Arrests>17</Arrests> </Patrol> ... etc ... Here is an excerpt of the SchemaNodeElement table that represents schema information of the elements in the data model: Table: SchemaNodeElement +----+----------+----------+----------+ | Id | ParentId | Name | DataType | +----+----------+----------+----------+ | 1 | NULL | Patrol | 0 | | 2 | 1 | Officer | 0 | | 3 | 2 | FullName | 1 | | 4 | 2 | Arrests | 2 | ... where: DataType 0 = complex type 1 = string 2 = integer Here is an excerpt the element node table, representing instance data: Table: NodeElement +----+----------+----------+--------------+ | Id | ParentId | Name | TypedValueId | +----+----------+----------+--------------+ | 1 | NULL | Patrol | NULL | | 2 | 1 | Officer | NULL | | 3 | 2 | FullName | 1 | | 4 | 2 | FullName | 2 | | 5 | 1 | Arrests | 1 | ... Typed-value for complex types is NULL. I also have separate tables for each datatype, as follows: Table: TypedValueInt +----+-------+ | Id | Value | +----+-------+ | 1 | 17 | ... Table: TypedValueText +----+---------------+ | Id | Value | +----+---------------+ | 1 | Carey Mahoney | | 2 | Larvell Jones | ... XPath expression processing is divided into two phases: compilation and execution. As an example, consider the following expression: /Patrol[Arrests>10]/Officer The compiler does the schema validation by executing SQL statements against the schema data in the underlying database, testing that the structure is valid and retrieving the datatypes for values needed in comparisons (in this example, datatype for the Arrests element is retrieved). In addition, the compilation base generates the SQL for the execution phase. After the compilation succeeds, the expression can be executed. Suppose something like the following is generated by the compiler (this may or may not be correct, but I know you get the idea): SELECT Id FROM NodeElement WHERE Name = 'Officer' AND ParentId IN ( SELECT Id FROM NodeElement WHERE Name = 'Patrol' AND Id IN ( SELECT ParentId FROM NodeElement e, TypedValueInt i WHERE e.Name = 'Arrests' AND e.TypedValueId = i.Id AND i.Value > 10 ) ) As you can see, the expressions are processed against the schema data (compilation phase) or instance data (execution phase) in the underlying relational database. So far so good. But then, as the number of document instances increases in the database, the performance degrades linearly, if not worse. This is where I guess some kind of indexing solution is essential. What kind of efficient indexing solutions are there, considering this kind of native XML database? Structural indexing would be the most important issue I suppose, followed by value indexing and perhaps also full-text indexing. Any help is appreciated. -- Regards, Timo
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Cast Your Vote
We need your help – Vote for DataDirect XML Products!
Winners and finalists announced at SOA World Conference in November. Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||







