[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Assisted Search of XML document collections
Hi, all There is a modest effort being assembled to look at this prototypical problem: PROBLEM - multitudes of XML documents. The collection is not necessarily static, but if dynamic only incrementally so. The business case that would apply is that it makes sense to markup the original documents using XML; it also makes sense to search a file which is a description of the document collection rather than the whole document collection. The derivative file is the "index" - we are not assuming that it itself need be XML. I am choosing my language carefully as there seems to be an equal mixture of enthusiasm and coolness displayed towards an XML document collection indexing scheme. The fact of the matter is that so far we have identified a number of problems which are amenable to assisted search. We are not particularly concerned, at this point, in breaking any new ground in XML - rather, this is a project designed to address a subset of XML "usage" problems. Although I have announced this project on the perl-xml list, and it will concentrate on Perl, with and without XS, there is no reason that Java and/or C/C++ viewpoints are not welcome. We are primarily interested in exploring issues pertaining to the construction of a file that describes a collection of XML documents in a succinct fashion, most likely with a moderate to high degree of application specificity - i.e. there may not be a lot of defaults that make sense. We also wish to supply a useful API that search engine writers can use. This is really at an early stage. I'm announcing it here to get some feedback. If there is fundamental agreement about one thing, it's that there are going to be cases to be made for collections of XML documents, the problem will involve searching them, there will be too many files to be searched by brute force, and we are proposing that a document can be constructed which summarizes some desired knowledge about the collection (we're not even saying - yet - that the index itself need be sorted), and in the simplest sense, because it happens to be much smaller can be searched instead, supplies fast pointers to individual XML files, and you take it from there. Whew! :-) Anyway, feedback welcome. If you wish to contribute please contact me. I will be posting a formal note concerning folks involved within the week. Thanks. Arved Sandstrom xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|