[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: A super-efficient way to compute the sum of A[i] *
On Sat, 2020-05-09 at 12:00 +0000, Costello, Roger L. costello@xxxxxxxxx wrote: > Hi Folks, > > I need a super-efficient way to compute the sum of A[i] * B[i] for > i=1 to n. The most efficient way to do something in computing is often to avoid doing it. So if it's a performance bottleneck, see if you can avoid it altogether. I did some timings (see below) but found overall that building the XML tree dominated: a stylesheet that just produced a message took 15 elapsed seconds on a 700MByte input file, including JVM startup and compiling the stylesheet; the computation took another 4 or 5 seconds. Timings: For what it's worth, i tried with slightly different XML -- a v element having a and b children, <v><a>.438743</a><b>.4874343</b></v> <v>... Saxon 9 EE took 19 seconds to process 10,000,000 v elements, of which the first 1.5 seconds or so was JVM startup and stylesheet complilaion (based on a test file with only 3 v elements). <xsl:template match="/"> <r> <xsl:message select="'count: ' || count(//v)"/> <xsl:value-of select="sum( //v ! my:pair(./a, ./b))" /> </r> </xsl:template> <xsl:function name="my:pair"> <xsl:param name="a" as="xs:double" /> <xsl:param name="b" as="xs:double" /> <xsl:sequence select="$a * $b"/> </xsl:function> If the element names were longer, it'd take more time (my test file was half a gigabyte). Eliminating the function and using <xsl:value-of select="sum( //v ! (./a * ./b) )" /> was about the same time or a second slower, BUT the first time i had a typo, a , instead of a *, and this was not an error. The function version is more robust against errors. Using (./a treat as xs:double) * ./b treat as xs:double) produced a runtime error after 14.5 seconds (using an XML schema would have removed the error) showing that constructing the sequence of v elemnts is taking most of the time. Using cast instead of treat (xs:facepalm) worked and took 19 to 21 seconds (running it multiple times). Changing //v to /s/v made a small improvement (18 or 19 seconds instead of 20). Using a template to match v elements and return a * b, storing tha tin a variable of tytpe xs:double*, and returning sum() on it, was about the same speed. Using <double-value> instead of <v> to have a 763 MByte file took only slighty longer. I did notice, however, that the process was getting about twice as many CPU seconds as elapsed seconds, showing evidence of multi-threading. For what it's worth i tried constructing two sequences, instead of reading elements from an XML file; it took about 5 or 6 seconds, or, with the same input file as before, about 18 seconds - in other words it's not the XML part that's taking time. Liam -- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Barefoot Web-slave, antique illustrations: http://www.fromoldbooks.org
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|