[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message]

run time error

Michael Kay mhk at mhk.me.uk
Tue May 9 10:40:37 PDT 2006


run time error
First try increasing the memory available to the JVM. I generally use

java -Xms512M -XmX512M

However, you may also need to find a way of writing your query in a way that
is less greedy in its use of memory. I've changed the layout of your code
below to make it legible, and added comments prefixed MHK>


declare function local:pathOfNode($node) {
     if (empty($node/..)) 
     then "" 
     else concat(local:pathOfNode($node/..), "/", local-name($node))
};

let $j:= doc("test.XML")
let $paths := for $n in $j//* return local:pathOfNode($n)
let $childpaths:= (for $item in $paths 
                   where
count(tokenize(substring-after(string($item),"/"),"/")) >1 
                   return $item)

MHK> Both $paths and $childpaths are referenced more than once, and their
values will therefore be stored in memory. If your 11Mb document is
reasonably structure-rich, then it might well contain 200K elements, each
having an expanded path of say 100 characters, which is 200 bytes, so each
of these two variables is going to occupy about 40Mb of memory. And further
on, $leafs is the same.

MHK>I'd suggest you start by only taking the distinct paths:

MHK>let $paths := distinct-values(for $n in $j//* return
local:pathOfNode($n))

MHK>which means you will never need to hold the full set of paths in memory.


for $p in distinct-values($childpaths)
let $toks:= tokenize(string($p),"/")

MHK>It seems very wasteful to carefully build up the concatenated path, and
then split it up again by tokenizing. But I'm afraid, given the absence of
comments and unhelpful variable names, I've lost the thread of what you're
trying to achieve here. 

let $papa:= string-join(subsequence($toks, 1, count($toks) - 1), "/")
let $var:=substring-after(string($p),"/")
let
$leafs:=$j//text()[normalize-space()][string-join(ancestor-or-self::element(
)/name(),'/') eq $var]

MHK>Any particular reason you used a recursive function to form the path for
elements, but are using string-join to form the paths for text nodes?

MHK>I suspect that the expression $j//text()[normalize-space()] is going to
be pulled out of the "for $p" loop, so it only needs to be evaluated once:
but that's another great chunk of memory gone. If you're using Saxon-SA then
it's also likely to be indexed to avoid an O(n^2) join.

return
  <STATISTICS>
    <PATH>
      {string($p)}
    </PATH>
    <RATIO>
      {string( round( count($childpaths[.=$p]) div
                       count($paths[.=$papa]) * 100 ) )}
    </RATIO>
      {for $val in distinct-values($leafs)
       return <value-per-path 
                      value='{normalize-space($val)}'
                      count='{count($leafs[. eq normalize-space($val)])}'/>}
    </STATISTICS> 

MHK> I've been trying to find suggestions for improving this code but I have
difficulty seeing exactly what it's doing - it seems to be collecting some
basic statistics on the structures present in the document, but it's doing
so in a pretty heavy-handed way. 

To be quite honest, I'd suggest writing this in XSLT. It's a grouping
problem, and XSLT 2.0 has built-in grouping operators which XQuery 1.0
lacks. This is likely to give you a far more efficient solution, both in
space and time usage. For starters, if you do

<xsl:for-each-group select="$j//*" group-by="local:pathOfNode(.)">

then this gives you a group which is the set of nodes having the same path -
so the groups are sets of nodes, not sets of paths.

If you don't want to switch languages, you could consider using the
higher-order saxon:for-each-group() extension function which gives you the
same functionality in XQuery.

Michael Kay
http://www.saxonica.com/




PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2011 All Rights Reserved.