[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Indexing of XML documents

  • From: Peter@u... (Peter Murray-Rust)
  • To: xml-dev@i...
  • Date: Sat, 15 Mar 1997 17:44:06 GMT

indexing on xml documents
In message <9703150224.AA05729@s...> lee@s... writes:
> > When we need to resolve a TEI pointer like (id a23) we may have to scan
> > the whole document.
> 
> This all depends on who "we" is taken to be.
> 
> A web indexing robot doesn't need to resolve tei pointers at all,
> except to identify the remote document -- it then indexes the whole thing.

I am guilty of imprecision ( sorry :-) I meant an internal indexing of the 
document tree, not an index to locate the document.

> 
> > In general we will wish to cache (index) IDs since
> > we don't wish to rescan for another search.
> I don't follow this.  Under what circumstances is searching a document for
> an ID much more painful than using a cache?  Is this for 100 MByte documents?
> (which do exist, by the way, droves.  No, like elephants, in herds)

Yes - I was thinking of exactly that.  Particularly if the document contains
thousands of elements (e.g. large chunks of HTML-like material).  
> 
> > When validating a document the IDs, GIs and ATTNAMEs all have to be scanned
> > since they occur in VC's.
> Not sure what a VC is (validatable context??) but yes, they all have to
> be validated.

VC = 'validity constraint' - see XML-draft 1.4 and abbreviated as this in
later places.  The point is that (say) in production 52 all IDs have to be
scanned for uniqueness.  Therefore at this stage it could be useful to 
hash them so that they could be extracted rapidly if they form part of a
later search, rather than going through the whole doc again.

It's no big deal - but since I found myself doing it for various 
searches, it seemed worth thinking about in the API.

	P.
-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@i... the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.