[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: xml search engine?
This is an area I am very involved in and hope that the following answer is some use. At 11:43 AM 3/29/00 +0200, Reinout van Rees wrote: >On Tue, 28 Mar 2000, Jean Marc VANEL wrote: > > >There is a problem I see for xml search engines. How are they going to >cope with all the various DTD's? They ARE going to cope, but what will >be the result? Will we have lots of small search engines searching for >information in all reinforced_concrete_supplier.dtd xml files it can >find and another for all medicine.dtd info? Will there be a few >standard elements in most DTD's to comply to some emerging behaviour >of all search engines? There are so many ways this could work out. Any >opinions? We have sometimes discussed on XML-DEV whether it is possible to have schemas such that they are completely machine-interpretable. [I use this to mean "If my machine gets a *.xml +*.xsd from some other machine and there is no prior agreement, can my machine do something useful with the *.xml (other than print it out for a human to read).] The general consensus was that this was an ultimate goal but probably beyond most XML-ers immediate vision. Therefore there has to be some prior agreement about semantics and ontology. I am intimated involved with "medicine.dtd" on two fronts. [I don't suggest the discussion wanders into the details - I use it as an example]. I have been compiling XML versions of drugs and diseases in conjunction with expert centres in the field. There is no universal "medicine.dtd" and unlikely to be one. It is more likely that there will be several approaches, including HL7, MEDLINE and UMLS metathesaurus and others. These will probably all evolve to have an XML interface. It will depend largely on how the systems are currently deployed and users will need to know the details of the organisation of each. [XML isn't magic, it can be a useful wrapper for existing approaches]. In general these resources consist of human-readable information and the search engine will have to know how this is organised. I have also developed CML (Chemical Markup Language), which is now starting to become standard. I am working on making portable semantics, especially through a Java-based CML-DOM. The attraction of this is it formalises the semantics in a non-arbitrary way - no-one can argue that a DOM is a non-standard approach. IOW, having developed the DTD for a technical discipline, then implementation of a DOM is IMO almost mandatory. It is also extremely good discipline because it makes it clear that every element in the DTD and every attribute may have to have some code written. Because of the labour of doing this I would hope that people collaborate on a communal DOM (mine will be OpenSource) and in this way we shall not get mutant versions. The DOM necessarily defines the semantics and sometimes hardcodes ontology (through behaviour). In this way we move towards a standard way of doing things in a discipline. This may not be the "best" way, but it is likely to fly. Therefore I would expect most chemical semantics to depend on the DOM. It may be that the DOM exposes a "search" interface and I hope it does or will (DOM3 people? aren't we discussing this at present :-) This means that CML will become a component wherever standard chemistry is involved. CML is an 80/20 solution to chemistry, has been submitted to the governing body of chemistry (IUPAC) and hopefully will be used in a wide range of documents. I envisage at least patents, safety, drugs, publications, bioinformatics, medicine, materials, etc. I will be able to ask a question like: "does this document contain any elements in a namespace mapped onto the URI http://www.xml-cml.org?" If so, they can only be *valid* if they conform to the CML DTD. In that case we could ask a query (in XQL-like syntax): "find all molecules with more than 20 carbon atoms:" //molecule/atom/builtin[@elementType='C'][position()=21] This is incredibly powerful. If, however we want even more power we could write extension functions. Of course the community has to know what these are and they could find all molecules with aromatic rings, all those with electrons calculated to have particular energies (this could be done on the fly as part of the search!). I have the honour of having been asked to help on the construction of the Materials Markup language (MatML) run by Ed Begley at NIST. This may well involve concrete. In any case we are tackling exactly these questions - creating a *simple* approach, linked to terminologies and interoperable with several other MLs. P. *************************************************************************** This is xml-dev, the mailing list for XML developers. To unsubscribe, mailto:majordomo@x...&BODY=unsubscribe%20xml-dev List archives are available at http://xml.org/archives/xml-dev/ ***************************************************************************
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|