[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Streaming XML (WAS: More on taming SAX (was Re: ANN
I wrote: > When one writes: > > f:foldl-tree($f:add, $f:add(), 0, /*) Must be: When one writes: f:foldl-tree(f:add(), f:add(), 0, /*) Dimitre. "Dimitre Novatchev" <dnovatchev@y...> wrote in message cqsvbp$fbi$1@s...">news:cqsvbp$fbi$1@s...... > Why I think Daniela Florescu is right? > > Please, bear with my style, which has nothing to do with SAX and any kind > of APIs mentioned in this thread. Just read on, I promise you'll agree > that my message is relevant. > > This is the code of the f:foldl-tree() function, which is part of the FXSL > library: > > <xsl:stylesheet version="2.0" > xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > xmlns:f="http://fxsl.sf.net/" > xmlns:int="http://fxsl.sf.net/int/folfl-tree" > exclude-result-prefixes="f int" >> > <xsl:import href="func-apply.xsl"/> > > <xsl:function name="f:foldl-tree"> > <xsl:param name="pFuncNode" as="element()"/> > <xsl:param name="pFuncSubtrees" as="element()"/> > <xsl:param name="pA0"/> > <xsl:param name="pNode" as="element()"/> > > <xsl:choose> > <xsl:when test="not($pNode)"> > <xsl:copy-of select="$pA0"/> > </xsl:when> > <xsl:otherwise> > <xsl:variable name="vSubtrees" select="$pNode/*"/> > > <xsl:sequence select= > "f:apply($pFuncNode, > $pNode/@tree-nodeLabel, > int:foldl-tree_($pFuncNode, $pFuncSubtrees, $pA0, > $vSubtrees) > )" > /> > </xsl:otherwise> > </xsl:choose> > </xsl:function> > > <xsl:function name="int:foldl-tree_"> > <xsl:param name="pFuncNode" as="element()"/> > <xsl:param name="pFuncSubtrees" as="element()"/> > <xsl:param name="pA0"/> > <xsl:param name="pSubTrees" as="element()*"/> > > <xsl:sequence select= > "if(empty($pSubTrees)) > then $pA0 > else > f:apply($pFuncSubtrees, > f:foldl-tree($pFuncNode, $pFuncSubtrees, $pA0, > $pSubTrees[1]), > int:foldl-tree_($pFuncNode, $pFuncSubtrees, $pA0, > $pSubTrees[position() > 1]) > )" > /> > </xsl:function> > </xsl:stylesheet> > > In a few words, this is a generic fold() but over a tree (not just over a > list). As such, it needs two functions to be provided as parameters -- one > for processing the current node and one for processing all subtrees of the > current node. > > When one writes: > > f:foldl-tree($f:add, $f:add(), 0, /*) > > the result of evaluating this is the sum of the values of all > @tree-nodeLabel attributes of all nodes in the tree. > > If I pass as parameters other functions, I'll perform other processing on > a (any!) tree. > > So, in case of XSLT/XQuery processing, we pass the necessary two functions > as parameters to f:foldl-tree() and we have implemented an XSLT/XQuery > processor. > > Why is this all relevant to the current discussion? > > Because: a fold() processing of any kind is essentially streaming. > > Therefore, let.s just provide the required two functions and not worry how > the function engine does streaming -- there could be reasonably efficient > implementations. The most obvious example is a lazy implementation -- no > subtrees are ever processed unless ultimately required. > > What is more, in a lazy implementation the source tree can itself be > evaluated lazily -- only those nodes/subtrees will need to be parsed, > which are ultimately required. > > Just as a side note -- streaming a tree implies linearization -- this may > go against efficiency when opposed to parallelization (e.g. using a DVC > (divide and conquer) approach), which is the ultimate strength of > functional languages and will start to matter more and more as explained > in the paper "The Free Lunch Is Over: A Fundamental Turn Toward > Concurrency in Software" > (http://www.gotw.ca/publications/concurrency-ddj.htm) by Herb Sutter. > > Parallelization may require that different threads share the same data, > which will delay the possibility to discard this data from memory. > > > Cheers, > > Dimitre Novatchev. > > > "Daniela Florescu" <dflorescu@m...> wrote in message > 30291DBF-590E-11D9-A33A-000393DC762C@m...">news:30291DBF-590E-11D9-A33A-000393DC762C@m...... >>> As someone who was until very recently "one of those implementers" I >>> completely disagree with you. We had customers who want to process XML >>> documents that hundreds of megabytes to gigabytes in size who can't >>> afford to materialize even a fraction of these documents in certain >>> cases. >> >> >> Dare, >> >> what exactly are you disagreeing with ? >> >> This discussion is going in zig-zag. Did you read my postings ? Did I >> ever tell >> you that XQuery was the solution for **everything** !? I don't remember >> saying that. >> >> I was just reading this SAX/streaming/memory consumption discussion, and >> being a person who actually designed and implemented such a streaming XML >> query processor, I had a terrible sensation of deja vu. There are solid >> solutions >> in the published and implemented state of the art already. >> >> I was just curious to know if there are deep technical issues why people >> have to >> reinvent such techniques. I learned that there are cases where indeed >> there is >> no point in using preexisting XML processors, simply because they don't >> apply, >> and people have to do it by hand. >> >> But I also learned that a lot of reinventing the wheel is also for fun. >> I'm not gone >> comment on that. Next time I take a plane I can only cross fingers that >> the people who >> designed the air control traffic system optimized for something different >> then their >> programmers's fun. >> >> So I reiterate my point: there are well known techniques to maximize >> streaming and >> minimizing memory consumption. Many of them are already implemented in >> existing >> systems, and many will show up in the next versions of various industrial >> strength >> products. >> >> In a big majority of the cases, people who need to process XML don't need >> to understand >> the gory details of buffer management. And they shouldn't. They should >> concentrate only >> on the logic of their application, and rely on good XSLT/XQuery >> compilers and runtimes >> to do the right job concerning the implementation strategy. >> >> As for the well known techniques for minimizing memory consumption, I am >> afraid that >> I cannot point to any specific technique on this mailing list, for the >> following reasons: >> >> (a) it's too much literature to be discussed in such a forum >> (b) a lot of it is folklore >> (c) a lot of it is simply inherited from streaming and lazy evaluation of >> SQL >> query processors, using the iterator model. (Goetz Graefe can tell you >> much more >> about that then me, and he's closer to you), and you can imagine how much >> folklore is there too after 30 years >> >> The best idea that comes to my mind is to encourage somebody to write a >> survey of such techniques, that might be helpful. >> >> My conclusion: please rely on good compilers, good optimizers and good >> runtimes >> instead of writing XML processors by hand if you don't *really* have to >> (and few people >> really have to). And trust the vendors/open source implementors that they >> will produce >> such good compilers, optimizers and runtimes when time comes. >> >> As far as I am concerned, the horse is dead, I don't have much else to >> add. >> >> Best regards, have a wonderful holiday season, >> Dana >> >> >> >> >> ----------------------------------------------------------------- >> The xml-dev list is sponsored by XML.org <http://www.xml.org>, an >> initiative of OASIS <http://www.oasis-open.org> >> >> The list archives are at http://lists.xml.org/archives/xml-dev/ >> >> To subscribe or unsubscribe from this list use the subscription >> manager: <http://www.oasis-open.org/mlmanage/index.php> >> >> > > > > > ----------------------------------------------------------------- > The xml-dev list is sponsored by XML.org <http://www.xml.org>, an > initiative of OASIS <http://www.oasis-open.org> > > The list archives are at http://lists.xml.org/archives/xml-dev/ > > To subscribe or unsubscribe from this list use the subscription > manager: <http://www.oasis-open.org/mlmanage/index.php> > >
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|