Re: are native XML databases needed?
Rick, WRT triples, RDF, and queries - I recommend you look at SWI Prolog and the predicate libraries they have available - all open source. Enjoy, DW Quoting Rick Marshall <rjm@z...>: > ok, can't stay out of this any longer.... > > relational database refers to the storage of relations - n-tuples > (that's what a relation is). there is nothing inherently fast or slow > about a relational database. > > what is fast or slow is the management systems around them. and sql is a > classic example of something that is slow because at it's heart it is a > procedural language - the verbs, like join, often imply large amounts of > work before any optimisation. reason - lack of semantics. object and > other so called database designs are really management systems that try > to use semantics to match how we think of the data and/or improve > performance. > > triples are interesting because they imply some form of ultimate 5th > normal form. each datum stored separately. some sort of semantics is > implied by the structure of rdf. > > the big difference between triples and 5th normal form is the regularity > of a relational database. alternatively you can think of triples as 5th > normal form with missing columns as implied null values (something i'm > looking into at the moment). > > i think we could move forward a lot faster by recognising that a) the > storage and maths of relational databases is one thing b) the semantics > is another. > > using this model, sql is a semantic layer, so is the network database, > so is object oriented, and so is rdf. > > we get very high performance by making this distinction with all data > stored in easy to access relations and semantic tools to do all the > things we talk about - retrieve, store, validate, format, publish etc. > > then one of the things you can do is make validation constraints that > are temporal - apply only as required, and apply across the entire > database, not just the table, relation, document, etc that is being > looked at. > > eg the underage egyption employee could be solved by a table of branch > offices with minimum employment age as an attribute and reference to > that table when deciding on the validity of a candidate. or it could be > used for post employemnt checking that company policy is being followed. > or it might be applied to data entry, but because circumstances change > you don't want the contraint applied to existing employees or when > moving records between tables, or when rebuilding a table. > > so after many months now watch ing the discussions on this list closely > i've concluded, for myself at least, that xml wrt data is a semantic > layer. i've also realised through my brief study of rdf that we can > design a new (non-xml) storage mechanism that supports triples as easily > as it does relations and that seen in this light there is a unifying > theory of data storage. > > putting this together will i guess be the last big project of my career, > and it is exciting looking forward to the new applications i can now tackle. > > rick > > ps thanks for the inspiration. > pps for those who asked, we are still debating internally about > releasing our data technology as open source. > > Hunsberger, Peter wrote: > > >Bullard, Claude L (Len) <len.bullard@i...> asks: > > > > > > > >>Off topic, but since data warehousing comes up from > >>time to time: what is the advantage of using > >>an OLAP design vs a relational design? Is this > >>advantage better or worse than a triple design? > >> > >> > >> > > > >Now you've done it, you've gone and imported a perm thread from the > >database world into xml-dev... > > > >With the exception of the specialized spatial, null compressed, database > >designs, for the most part, OLAP designs are relational designs just > >highly denormalized. I can't really see a significant relationship to > >triple stores. Your prototypical warehouse "star" schema puts a single > >large table at the center of a bunch of smaller tables (snowflake > >schemas normalize a bit). Most of the many to many relationships are > >denormalized. Relationships are hard coded in the center tables and your > >standard relationship traversal goes away (that's the whole point, avoid > >join processing costs at the cost of higher storage utilization). > > > >Now you could just plop an entire triple store into a single table but I > >can't see how that approach would work at all, all relationships would > >be via procedural value look up and comparison. To put it another way, > >triples are all about relationship management as opposed to value > >management which is what a data warehouse schema is for. > > > >Having said that I'll note that if you go to 5th normal form you end up > >with a sort of inverted star; tiny little tables connected to a bunch of > >larger tables. This is because you've used a single table (with > >possibly a single column) to normalize out a bunch of relationships. > >This pattern does have something to do with triple stores (since that's > >what we're using it for). Given my statements above I'd guess it has > >something to do with ending up with a single key for relationship > >traversal across multiple dimensions/perspectives and thus being able to > >annotate the relationships. I'd postulate that there are some formal > >properties shared between graphs and 5th normal form databases. > > > > > > > >----------------------------------------------------------------- > >The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > >initiative of OASIS <http://www.oasis-open.org> > > > >The list archives are at http://lists.xml.org/archives/xml-dev/ > > > >To subscribe or unsubscribe from this list use the subscription > >manager: <http://www.oasis-open.org/mlmanage/index.php> > > > > > > > > > >
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format