Re: XML Search Engine Holy War - Attributes vs. Elements
Duane Nickull wrote (re GoXML Context-based Search Engine): > >1. Ignore Attributes all together and index Elements and Character Data >only. > >The feeling is that the use of attributes should be restricted (by >authors).. [snip] > >2. Index attributes as text only and place the resulting text within the >character data portion of the index. >[snip] > >3. We should index attribute values as ____________? names as _____? Robert DuCharme wrote [re #1]: > >This shouldn't even be considered. Attributes are used for far more than >what the above paragraph describes. [snip] I heartily agree with Robert here. We are a publisher of medical and veterinary reference materials, and we often use XML attributes to qualify their associated elements. We also use attributes extensively in configuration files that are largely comprised of empty-element types (i.e., no character data or element data to index!) -- and we want our indexing tools to handle any of our XML data (yet another reason we're looking long and hard at replacing/supplementing our DTDs with XML Schemas [..not intending to re-start the 'DTD vs. Schema' flame-war] ;-). Neither option 1 nor 2 is acceptable. If you are segregating indexed content, then you need to add a section for attributes as well as character data, without merging these two vitally different portions of the XML structure. To fill-in the blanks in option 3, you could simply treat attribute names as analogous to element type names and attribute values as the "character data". Or you could treat these as a separate searchable category. It may be very nice to provide a mapping mechanism between element-chardata and attribute name-value pairs to handle differences between websites. For example, site A might be pure elements, whilst site B uses elements with attributes -- yet i'd want to be able to do a title search that would hit on both, que no? A: <book_catalog> <book> <title>Professional XML<title> <pub>Wrox Press Ltd.</pub> </book> </book_catalog> B: <book_catalog> <book title="Professional XML" pub="Wrox Press Ltd."/> </book_catalog> Also heed Robert's mention of ID/IDREF attributes -- these will be critical for serious XML apps! As for the remark "..the use of attributes should be restricted (by authors)..", i hope that you're not serious about this! IMHO, any XML tool/product/whatever that attempts to narrow the use of XML features and/or otherwise dictate structure to users of XML is doomed to a similarly narrow market. A related issue from GoXML's webpage "XML Meta Tags" @ http://www.goxml.com/about/xmeta.htm > >There is currently no standard for which we can index XML meta tags. We are >working on a standard for XML meta tags which are actually comments: > ><!--XMETA:KEYWORDS | keyword1 keyword2 keyword3--> >[snip] > >Another proposed way of doing this is through the use of processor instructions, >(PI's). [snip] > >This was a point recently brought up to us by Jacob Hammeken, and it looks like >this approach would be a much cleaner way of placing meta markup in an XML >document. Any comments? it's not just cleaner -- you must use PIs for this purpose, since XML 1.0 specifically states in section "2.5 Comments" that "..an XML processor [parser] may, but need not, make it possible for an application to retrieve the text of comments". And as Tim Bray states in his annotations: "This means that if you're building an XML application, you should never rely on anything that shows up in a comment (this sleazy trick is far too common in HTML)." The parser used by your indexer may provide you the comments, but mine might not -- and i'm not necessarily going to be happy to change my parser to use your indexer, eh? Regards and best wishes, -Nik O, Teton Data Systems, Jackson, Wyo. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To unsubscribe, mailto:majordomo@i... the following message; unsubscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format