|
next
|
Subject: XML Streaming not working Author: George Willis Date: 15 Jan 2009 09:01 PM
|
I have a 754MB file made up of a root node and a bizillion 1st level nodes I call $listings. I want my xquery to do the following to bring each $listing into memory, do an office file update using XUL, drop the current $listing from memory, and then go to the next. What I got is a HEAP of problems. :) (Out of memory -- so it is not streaming)
This code works for smaller files ( < 20MB ), just not for large files, which tells me the streaming is not working. I wrote the code specifically to iterate $listing so that the context needed for processing be small (just one listing at a time is needed in memory).
When I run smaller listing files, I can see the files being updates in real-time, so it appears to stream, but obviously something is wrong as I get a nasty heap overflow on a large file.
How much more does a guy have to do to get streaming to work??? :b
Here is the code...
declare namespace o="http://www.nxs.com/StandardXML1.xsd";
declare namespace l="http://www.nxs.com/StandardXML1_0.xsd";
declare option ddtek:xml-streaming 'yes';
declare option ddtek:serialize "indent=yes, omit-xml-declaration=no";
let $inOffice := doc("file:///E:/Offices.xml")
let $mO := $inOffice/o:Offices/o:office
let $inListing := doc("file:///E:/resi.xml")
let $mL := $inListing/l:Listings/l:Residential
let $out_dir := "file:///e:/split-nxs/"
let $mode := "bulk-splitByOffice"
return
for $listing in $mL
let $office_id := $listing/l:LO/text()
let $filename := concat($out_dir,"NXS_office=",$office_id,".xml")
let $xml := doc($filename)
let $office := $mO[o:OfficeMLSID/text()=$office_id]
where exists($office)
return (: Insert the $listing to the correct office file created in Split_nxs_1?.xquery :)
(
insert node $listing
as last
into $xml/NWMLS,
put($xml,$filename)
(: ddtek:serialize-to-url($xml,$filename,"") :)
)
|
top
|
Subject: XML Streaming not working Author: Minollo I. Date: 16 Jan 2009 09:53 AM
|
That's an interesting XQuery. There are a few problems that are breaking streaming:
1) The presence of doc("some literal URI") and doc($myComputedURI) stops streaming. DataDirect XQuery does this in the assumption that both doc() might reference the same document, thereby breaking semantics of XQuery. You could argue most usecases actually don't run that risk, and you might be right. Probably a rule we should think about relaxing. Anyway, a way to fool the XQuery engine in this particular case is by replacing doc("some literal URI") with a external variable.
2) Even after taking care of 1), the insert expression seems to cause streaming to still fail; that could well be a problem in the engine (we are investigating). A workaround is to use a transform expression.
The following query should stream, even if we haven't tested on live data. To test it, you'll need to associate file:///E:/Offices.xml to $inOffice and file:///E:/resi.xml to $inListing in the Parameters tab of your XQuery Scenario settings.
declare namespace o="o";
declare namespace l="l";
declare option ddtek:xml-streaming 'yes';
declare option ddtek:serialize "indent=yes, omit-xml-declaration=no";
declare variable $inOffice as document-node(element(*,xs:untyped))
external;
declare variable $inListing as document-node(element(*,xs:untyped))
external;
let $mO := $inOffice/o:Offices/o:office
let $mL := $inListing/l:Listings/l:Residential
let $out_dir := "file:///e:/split-nxs/"
let $mode := "bulk-splitByOffice"
return
for $listing in $mL
let $office_id := $listing/l:LO/text()
let $filename := concat($out_dir,"NXS_office=",$office_id,".xml")
let $xml := doc($filename)
let $office := $mO[o:OfficeMLSID/text()=$office_id]
where exists($office)
return (: Insert the $listing to the correct office file created in
Split_nxs_1?.xquery :)
copy $xml := $xml
modify insert node $listing as last into $xml/NWMLS
return ddtek:serialize-to-url($xml,$filename,"")
|
|
|
|