[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: How to efficiently obtain the first 10 records of

Subject: Re: How to efficiently obtain the first 10 records of a file with over 2 million records?
From: "Dimitre Novatchev dnovatchev@xxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Wed, 19 Jul 2023 21:25:41 -0000
Re:  How to efficiently obtain the first 10 records of
And there are data structures, such as the Finger  Tree (of course, not
XML-based) that guarantee O(log(N)) access when searching by key or by
position.  Thus searching among 100 Billions of items in a Finger tree will
be as fast as the average linear search in a sequence of 66 items.

On Wed, Jul 19, 2023 at 1:32b/PM Dimitre Novatchev dnovatchev@xxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

> >  I have an XML file containing over 2 million <record> elements. I want
> to obtain the first 10 <record> elements.
> In general, there is no guarantee for achieving fast processing, unless
> there has been some initial / additional preparation.
> Imagine that "the first 10 <record> elements" happen to be the last ten
> elements of the possibly million elements of the XML document.
> Whenever people have huge data and they intend to retrieve and process
> small pieces of it,  then the usual solutions are:
>    a). Find meaningful "sub-structures" in the data and based on this
> split it into a multitude of smaller pieces of data, each containing a
> manageable number of such structures. For example, there are 100 Billion
> stars in the MWG. It would make sense instead of having one enormous XML
> document containing the data for all of them. to create several smaller
> documents, say for each spiral branch of the Galaxy. And indeed, the
> largest collection of such data today (just about 1% of all the stars in
> the Galaxy), produced by Gaia, comprises of multiple compressed files, not
> a single one. Selecting with which of these files to work is similar to
> orienting your telescope within a particular angle within the Galactic
> plane, or choosing a particular telescope type that has the desired
> technical characteristics.
>    b). Create an (different) index (and I believe that at least some
> XQuery implementations do that) for every important, imaginable search
> Using b). above one could specify some complete processing, say starting
> with XQuery (using an existing index) and when the wanted elements have
> been retrieved almost instantaneously, call the standard XPath 3.1
> fn:transform() for further processing.
> If one doesn't know what kind of searches/processing they are going to
> perform, this most probably means that they don't have defined any
> use-cases, compelling enough to justify the huge document creation, in the
> first place.
> Thanks,
> Dimitre
> On Wed, Jul 19, 2023 at 8:15b/AM Roger L Costello costello@xxxxxxxxx <
> xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>> Hi Folks,
>> I have an XML file containing over 2 million <record> elements. I want to
>> obtain the first 10 <record> elements.
>> Here's how I did it:
>> <xsl:for-each select="/Document/record[position() le 10]">
>>     <xsl:sequence select="."/>
>> </xsl:for-each>
>> I ran it and it took a long time to complete. I am guessing that the XSLT
>> processor is iterating over all 2 million <record> elements. Yes?  How to
>> write the XSLT code so that the XSLT processor stops iterating upon
>> processing the first 10 <record> elements?
>> /Roger
> XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
> EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/782854> (by
> email <>)

Dimitre Novatchev
Truly great madness cannot be achieved without significant intelligence.
To invent, you need a good imagination and a pile of junk
Never fight an inanimate object
To avoid situations in which you might make mistakes may be the
biggest mistake of all
Quality means doing it right when no one is looking.
You've achieved success in your field when you don't know whether what
you're doing is work or play
To achieve the impossible dream, try going to sleep.
Facts do not cease to exist because they are ignored.
Typing monkeys will write all Shakespeare's works in 200yrs.Will they write
all patents, too? :)
Sanity is madness put to good use.
I finally figured out the only reason to be alive is to enjoy it.

Current Thread


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.