[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: need for defining standard APIs for xml storage
Hi Dongwood Dongwood said: Very close, but not exactly. Most in the literature categorizes the queries into two or three a) Structural queries b) Content queries c) Attribute queries (This can be considered as a subset of a)) Didier replies: I was referring more by element queries something more like structural queries. I have to admit that the term structural query is better chosen than the term element query. However, by content query I meant queries on content which is not structured with the XML format like for instance an HTML document, a PDF or a word document. Or content wich is structured with XML but not meaningful enough to facilitate information retrieval Off course I can create an element and include the whole document as data content for this element as you suggested, but the structure does not help me to retrieve the content here. So the classical text indexing tools are more useful in that case. So let's reduce the query family to two kind: a) structured b) unstructured (a) is available when the document has an XML structure but not necessarily. Do not forget the case of an XHTML document having its content packaged as <p> elements. It does not help me to retrieve the content to have a big bunch of <p> elements. So, in that case we may consider the document unstructured and access its information component with unstructured document queries techniques (ie. indexing). In some cases, where some data content is big enough, I can further index the data content even if it is enclosed by a meaningful element. So, to be more precise, we can have queries based on the structural elements of XML (elements, attributes, etc.) and queries based on classical indexing techniques. But the main difference for the classical indexing is to replace the unit of information retrieval to be an element instead of a document. So, the index points to an element not a document. This is especially useful in the case of a permanent information set (i.e a GROVE). Dongwood said: I mean whatever you extend to the DOM, you get into the same situation. Basically, DOM is a representation of the whole XML document. On the other hand, the index is a small set of pointers to actual data. If you have a query like "find a SPEECH whose SPEAKER contains 'hamlet'", you have to search the whole DOM, which is not scalable to large document. On the other hand, if you have the inverted index for the document, you can get the elements having "hamlet" immediately. Didier replies: not necessarily. Just isolate the interfaces from the category name and do not take any consideration of the Document Object Model name which is a very bad frame for any evolution of these interfaces. What we have in our hands is objects having particular interfaces. All objects inherit from the same base interface and thus we can say that all objects inherit form the same base class: the node. Then further value is added by augmenting new interfaces to the basic interface. This is the inheritance mechanism. Just take the case of an SVG element which adds a lot more value to the basic node interface. Now, let's say that these objects are simply an API used to access elements stored in a permanent information set (i.e. a grove). In this case, we deal with object nodes and therefore we can use some patterns like the observer or other pattern used to navigate in the whole tree or we can provide a member to access any element. This is what's behind the SelectNodes function. Whatever the object you obtained from the permanent information set, you can obtain a new one with this function. This without having to permanently keep a root object. You do not even have to get the notion of a document. Just the notion of an entry point. Also, this removes the dependency to the document object which in the case of a permanent information set does not make sense if the information set has been composed of several XML text documents. However, if the confort level of this abstraction is acceptable then we can perceive the whole library as a single document. However, I agree that we have to be very careful with our metaphors. So, to make things clearer I should probably talk about a new model that re-use the same interfaces as the DOM is offering. The context here is no longer the context of a single document but more of the library :-)) Dongwood said: It seems to me that the notion of "permenant information set" looks like data repository. The first issue here is that how you store the data and refer to it elsewhere. And another is how a query space (the document space a query should look at) should be: should it be limited to the current XML fragment, or extended to following links? Your GROVE seems to be one solution for that. Didier replies: Yes it is. If we pay attention to what the DOM is in the context of a browser we can say that it is a transient data repository. The browser obtained a serialized version of the information set in a serialized format. This serialized format takes the form of an XML document. The browser re-build the information set from the serialized version. Then locally, several agents like scripts, like a style sheet engine can use this information set. Now, imagine that I have not a human driven agent like a browser on one end but a computer or an automatic agent. This latter sends a query to obtain an information set. The information set could be a tiny fragment of a bigger one stored on the provider side. The provider replies by sending a serialized version of the information set: an XML document. The receiver un-serialize if (i.e. parse it) and either a) build a transient information set or b) insert the received information set in an other one. A permanent one. Thus what these computers transmitted is an information set representing a fragment of their respective information set. So, when we speak of permanent information sets, we are no longer in the same space as the browsers are and the universe is not restricted to the single transmitted document. In fact, the producer extract a fragment of its own information set and send this fragment to the receiver. This latter can include this information set fragment into its own. We just created an abstract model representing the information space between these two agents. So, what if this information set fragment is stored in a directory service, this information set becomes a directory service node. Then inside that node, I have other nodes that represent the received information set. But you can always consider the directory service as a huge document and obtain from it a serialized version in the form of an XML document and later on apply a style sheet on it to make it proper to consumption to our senses. Cheers Didier PH Martin ---------------------------------------------- Email: martind@n... Conferences: Web Chicago(http://www.mfweb.com) XML Europe (http://www.gca.org) Book: XML Professional (http://www.wrox.com) column: Style Matters (http://www.xml.com) Products: http://www.netfolder.com *************************************************************************** This is xml-dev, the mailing list for XML developers. To unsubscribe, mailto:majordomo@x...&BODY=unsubscribe%20xml-dev List archives are available at http://xml.org/archives/xml-dev/ ***************************************************************************
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|