RE: Designing XML to Support Information Evolution
Rick Marshall <rjm@z...> writes: > > i found one way to fix the performance problem is with associative > structures. these are heavily indexed tables and associative lists to > work out navigation issues. and then it's very fast - much > faster than > exitsing techniques. i worked out how to do it with > relational databases > and now i'm building code for xml. but normalisation is important to > make this work. > > the "secret" is being able to traverse lists very quickly Could you go into a little more detail about what you're doing? List traversal is the one thing that relational database do very well... I also don't find a lot of problems with list traversal in XSLT. However, for building a hierarchical view of data (for presentation purposes) I find that gluing together lists doesn't perform; you really have to stick to a hierarchical representation of your data from end to end. The trick for doing this with databases is a little work, but not extremely difficult, I've talked about it here and the cocoon-dev lists a couple of times and can do so again if people want. I get the feeling that your work is all pure data manipulation and no hierarchical presentation so I don't think comparing the two approaches (hierarchical trees vs. associative lists) is meaningful, but perhaps I'm missing something? > > rick > > Hunsberger, Peter wrote: > > >Rick Marshall <rjm@z...> writes: > > > > > > > >>hierarchies fail, and this is my struggle with xml at the > >>moment, when > >>they have to support multiple hierarchies simultaneously. and they > >>largely fail because of a) the update problem, and b) the new > >>hierarchy > >>problem. reverse bill of materials is a case in point. > >> > >>having said that xml works really well where neither of these are an > >>issue - documents where the "semantics" don't change only the > >>contents; > >>and as i said before moving transactions between systems. > >> > >>even relational systems have problems because the semantics > >>is embedded > >>in the sql select statements. most so called post > relational systems > >>(not really sure that's a legitimate term, even though it's > >>used a lot) > >>basically embed semantics back into the structure. > >> > >>things like owl and to a lesser extent name spaces try to > express the > >>semantics as a meta model. imho a far superior approach. i > just don't > >>like naming relationships - prefer to acknowledge they exist > >>and what it > >>takes to define them, but not necessarily name them. > >> > >>now to xml and the cinderella id tag. the same effect as the > >>hierarchical xml could be achieved by allowing a name/value > >>pairing to > >>store the structure as attributes in the xml tag and they should be > >>treated as elements as well. > >> > >>the id tag is the required unique key, while special > >>associate elements > >>store structure. this has the advantage of flatenning the xml and > >>allowing the parsers to create structure on the fly to suit > >>the translators. > >> > >><home id="456"><home_elements/></home> > >><person id="123"><associate > >>type="home">456</associate><other_elements/></person> > >> > >>which would be approximately > >> > >><home id="456"> > >> <home_elements/> > >> </home> > >><person id="123"> > >> <home>456</home> > >> <other_elements/> > >> </person> > >> > >> > >>early days, but something like this would be much better for data > >>modelling. perhaps we can have post-xml? ;) > >> > >> > >> > > > >Interesting, this is essentially the structure I was comparing to a > >structured hierarchy in the "Parallel tree traversal" thread. Turns > >out that once I fixed up all my XSLT bugs and cleaned up the > code that > >the version that used the structured hierarchy runs about an > order of > >magnitude faster than the version that attempts to stitch > the hierarchy > >together from flat data using id/idref. > > > >I need a little more testing on the insert/update side, but I expect > >I'm going to proceed with a version of our code that can spit out > >multiple hierarchies cutting across our relationship lattice > on demand > >instead of trying to glue this together on the XML side. More XML > >output (redundant trees), but at least in our case > normalization costs > >too much in terms of performance and the extra space > consumption can be > >handled: the redundant data is generated only as needed from a > >normalized database and not persisted anywhere. It chews up > app server > >memory, but we're talking at most maybe 100 MB (if every model gets > >cached, which in our case will happen over time). A GB of memory is > >cheap enough that once more, throwing hardware at an XML > problem trumps > >trying to spend too much time optimizing it. > > > >More and more, I'm seeing that XML application optimization > comes down > >to explicitly exploiting the known algorithms for fast tree > traversal > >and generation and not trying to re-invent normalization from within > >XSLT (or Java transforms for that matter)... > >
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format