RE: Higher-Order Functions in XPath 2.0

Play the video
Subject: RE: Higher-Order Functions in XPath 2.0
From: "Michael Kay" <michael.h.kay@xxxxxxxxxxxx>
Date: Wed, 16 Jan 2002 15:27:45 -0000
Great stuff, Dimitre, please lob it over to www-xpath-comments.

> -----Original Message-----
> From: owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> [mailto:owner-xsl-list@xxxxxxxxxxxxxxxxxxxxxx]On Behalf Of Dimitre
> Novatchev
> Sent: 16 January 2002 10:07
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  Higher-Order Functions in XPath 2.0
>
>
> Hi,
>
> Bellow is a draft proposal for implementing higher-order functions in
> XPath 2.0.
>
> I'd greatly appreciate receiving any comments about the
> correctness and
> validity of this text, as well as for possible improvements.
>
> Cheers,
> Dimitre Novatchev.
> -------------------------------------------------------------------
>
> Higher-Order Functions in XPath 2.0
>
> Contents
> --------
> Part I. Problems with XPath 2.0
>
>        1. Using the "for" expression for incremental processing
>        2. Difficulties in returning aggregate (nodes) value
>        3. Combining two sequences in producing a sequence
>        4. Text processing beyond the limits of regular expressions
>        5. XPath language complexity has grown considerably
>        6. Inflexibility, where equality and comparison-returning
>           functions are needed
>        7. Little or no reusability is possible for "for" expressions
>
> Part II. Higher-Order Functions Solutions
>
> III. Conclusion
>
> IV. Recommendation
>
>
>
>
> Part I. Problems with XPath 2.0
> -------------------------------
>
> A general problem with XPath 2.0, as specified by the current working
> draft, is that some tasks exist, which cannot be solved within the
> language, and another group of tasks can be accomplished in XPath 2.0,
> but in a rather inefficient way. In all of these cases programmers can
> solve the task by using recursive XSLT templates and this is
> a powerful
> method, but at the same writing and testing different recursive
> templates for every particular case is a time-consuming and
> error-prone
> process and often the result has low reusability.
>
> Listed bellow are some examples of such problems:
>
> 1. Using the "for" expression for incremental processing
>    The "for" expression in XPath 2.0  cannot be used when the results
> of processing every item of a sequence depend on the processing of the
> previous items. Or in some cases it can be used, but in an obviously
> inefficient way:
>
>    - Find the product of all numbers in a sequence.
>    - Reverse a sequence
>    - Concat all string items of a sequence.
>    - Reverse a string or a sequence of items.
>
> A typical use-case, for which the XPath 2.0 solution is difficult and
> inefficient, is the following:
>
> Given a sequence of "book" nodes, calculate and display the amount of
> money received from the sales of each book (price * sales), but also
> obtain and display a running total, as each book node from
> the sequence
> is processed. To achieve this in XPath, one would write:
>
> for $i in (1 to count($items))
>    return ($items[$i],
>            sum( for $j in (sublist($items, 1, $i))
>                    return (@price * @sales) )
>
> In the above expression, if N is the number of items in $items, sum()
> is performed N * (N + 1) / 2 times.
>
> While the above may seem to be just a textbook example (and really can
> be found in Mike Kays book), there are real-world examples, where a
> running total must be calculated and even several results must be
> accumulated in parallel. I am deeply obliged to Mark Nahabedian
> (naha@xxxxxxxxxx ) for allowing me to quote his work which has to deal
> with exactly this problem -- a complete example can be found at:
>
> http://www.ai.mit.edu/people/naha/itrack/about.html
>
>
> 2. Difficulties in returning aggregate (nodes) value from a sequence,
> especially when returning those nodes depends in a non-trivial way on
> the other nodes of the sequence:
>
>    - Obtain all nodes with "maximum value" from a sequence, especially
> in the case when the node "value" is computed by a very complex
> expression.
>    - Obtain the nodes with "distinct values" from a sequence,
> especially in the case when the node "value" is computed by a very
> complex way.
>    - There's no general way to "filter" elements of a
> sequence based on
> a predicate.
>
> The reason is that a predicate (function) cannot be passed as a
> parameter to a general "filter" function. As a result programmers will
> write multitude of similar filtering expressions, without
> being able to
> re-use them. Such repetitions are time-consuming, error-prone, and
> generally result in un-maintainable and non-reusable code.
>
> Examples of problems in this group:
>
>    - Return the sum of squares of the numbers in a sequence.
>    - Return all items in a sequence, for which f(item) has minimal
> value
>    - For some function f() test whether all the values of f(item) on a
> sequence  are equal (> 0, etc.)
>    - For some function f() test whether all values f(item) on a
> sequence are in increasing order.
>
> Although a solution can be found in XPath, it will be difficult and
> inefficient.
>
> Also, for every different function f() another version of the same
> solution will have to be produced, because functions cannot be passed
> as parameters.
>
> 3. Combining two sequences in producing a sequence:
> Given (a1, a2, ..., aN) and (b1, b2, ..., bN) compute:
>
>      (a1 + b1, a2 + b2, ..., aN + bN)
>
>      (a1 * b1, a2 * b2, ..., aN * bN)
>
>      (a1 and b1, a2 and b2, ..., aN and bN)
>
>      (a1 or b1, a2 or b2, ..., aN or bN)
> etc.
>
>
> 4. Text processing beyond the limits of regular expressions is not
> possible.
>
> A real world problem was pointed out by David Carlisle -- he needs in
> his work to match strings, surrounded by  (unknown in advance number
> of) balanced parenthesis.
>
> For any such problem, it would be nice to have a general, table-driven
> parser() function. However this is not possible, because the parser()
> function will need to be passed as parameter a lex() function that it
> must call for obtaining the terminal symbols from the input text.
>
> 5. XPath language complexity has grown considerably and the language
> cannot continue to expand indefinitely:
>
> Already there are hardly any spare characters left for operators.
> Often there are (more than one) different ways of performing similar
> tasks.
>
> In contrast, a language that supports higher-order functions can be
> kept simple, small and elegant, while at the same time providing
> powerful means to produce any necessary new functionality.
>
> Thus the "standard" language features (e.g. operators and functions)
> can be kept to a minimum, while the language makes possible
> desired new
> functionality to be easily produced and accumulated into a library of
> general and reusable useful functions.
>
> Without support for higher-order functions such libraries are very
> limited in scope
> and usefulness.
>
> 6. Inflexibility, where equality and comparison-returning
> functions are
> needed to be passed to:
>    - sort
>    - distinct-values
>    - grouping, etc.
>
> 7. Little or no reusability is possible for "for" expressions
>
> In the expression bellow:
>
>  for $i in (1 to count($items))
>          return expression
>
> expression cannot be reused (simply copied and pasted) and
> will have to
> be modified every time it is used with differently named
> range variable
> so that references to the range variable are renamed.
>
> In contrast, with higher-order functions support one can have a map()
> function, so that in
>
> map(f, $sequence)
>
> the code of f() will never have to be modified.
>
>
> Part II. Higher-Order Functions Solutions
> -----------------------------------------
>
> Provided higher-order functions were available, the problems listed
> above have easy and natural solutions.
>
> To demonstrate the compactness and high degree of
> readability, the code
> bellow is written in Haskell. Haskell is used only for convenience, in
> no way should it be inferred that the same syntax is recommended for
> XPath 2.0. Some basic conventions from this language:
>
>  f x y = x * y
>
> This defines a function f(x,y) = x * y
>
>  [1, 2, 3]
>
> This is a list of elements 1, 2, and 3. The same (for our purposes) as
> (1, 2, 3) in XPath 2.0.
>
>  []
>
> This denotes the empty list -- the same as () in XPath 2.0.
>
>  x   -- denotes the name of a single element/function.
>
>  xs -- any name ending in 's' denotes a list of elements.
>
> Any operator can be used also as a function, when put in brackets.
> Thus:
>
> (+) 1 2 = 1 + 2 = 3
>
> The (:) operator is used to prepend an element to the start of a list:
>
>  x : xs
>
> defines a list with head x and tail xs.
>
> The flip() function takes as argument any function with two arguments,
> and produces as result a the same function, which takes these two
> arguments in reverse order.
>
>  flip f x y = f y x
>
> Primitive recursion over a list can be defined as follows:
>
>  foldl f z []      = z
>  foldl f z (x:xs)  = foldl f (f z x) xs
>
> The function "foldl" takes 3 arguments -- a function f(),
> which takes 2
> arguments, a value z, and a list.
>
> This is one of the most general functions over lists. It traverses the
> list from left to write, applying f() on each element and the
> currently
> accumulated result.
>
> There is a dual function (foldr), which behaves in a similar way, but
> traverses the list from right to left:
>
>  foldr f z []      = z
>  foldr f z (x:xs)  = f x (foldr f z xs)
>
>
> As can be easily seen:
>
>  foldl (+) 0 xs
>
> is the sum of all elements in a list xs. Therefore we could write:
>
>  sum xs = foldl (+) 0 xs
>
> Analogously:
>
>  product xs = foldl (*) 1 xs
>
> And this one liner is the solution to one of the problems in Part I,
> section 1.
>
> We can ommit the last operand(s) from an equation, in case it is the
> same and we still get a valid function definition. Therefore,
> the above
> function definitions could be simplified even further and written as:
>
>  sum   = foldl (+) 0
>
>  product  = foldl (*) 1
>
>
> Reversing the elements of a list (solution of another problem in Part
> I, section 1.) is simply defined as:
>
>  reverse    = foldl (flip (:)) []
>
>
> Concatenating all elements of a list (solution of the next problem) is
> simply:
>
>  concat            = foldr (++) []
>
> where (++) is the concatenation operator for lists.
>
>
> Combining two lists with equal length into one can be performed using
> the zipWith() function:
>
>  zipWith f (a:as) (b:bs)   = f a b : zipWith f as bs
>  zipWith _ _      _        = []
>
> The function f() is applied on every pair of elements at position N
> from the two lists, and the result forms the element at position N in
> the result list.
>
>  ZipWith() solves directly all the problems from Part I, section 3:
>
>    - (a1 + b1, a2 + b2, ..., aN + bN) is just:
>
>       zipWith (+) as bs
>
>    - (a1 * b1, a2 * b2, ..., aN * bN) is just:
>
>       zipWith (*) as bs
>
>    - (a1 and b1, a2 and b2, ..., aN and bN) is just:
>
>       zipWith  and as bs
>
>    - (a1 or b1, a2 or b2, ..., aN or bN) is just:
>
>       zipWith or as bs
>
>
>
> A very useful function is scanl:
>
>  scanl f q xs   = q : (case xs of
> 		         []   -> []
> 			 x:xs -> scanl f (f q x) xs)
>
> It is similar to foldl, but creates a list, every element of which
> contains the intermediate accumulated result. The first element of the
> result-list is q.
>
> In case the list is guaranteed to be non-empty, then the following
> function can be defined:
>
>  scanl1 f (x:xs)   = scanl f x xs
>
> It behaves like scanl(), but doesnt use a zero argument.
>
> As can be easily seen,
>
>  scanl1 `op` xs
>
> produces a list of the intermediate accumulated results of performing
> op() on the list xs. For example:
>
>  scanl1 (+) [1, 2, 3] = [1, 3, 6] = [1, 1+2, 3 + 3]
>
>
> scanl1() combined with zipWith() solve directly the problem of
> calculating the running total from Part I, section 1:
>
>  scanl1 (+) (zipWith (*) [1,2,3] [2,2,2])
>
> returns:
>         [2, 6, 12]
>
> A direct solution to the filtering problems defined in Part I, section
> 2, is provided by the function filter():
>
>  filter p xs       = [ x | x <- xs, p x ]
>
> it takes a function p() defined on the type of elements of its second
> argument - a list xs, and returns a list of those elements of xs, for
> which p(x) = true.
>
> Using it, we can write:
>
>  - Return all items in a sequence, for which f(item) has minimal value
>
>     minvals  f  xs     =   filter (= fmin)  ys
>
>                               where fmin =  minimum  ys,
>                                     ys   =  map f  xs
>
>
>  - For some function f() test whether all the values of f(item) on a
> sequence  are equal (> 0, etc.)
>
>    allFPositive  f  xs  =  foldl   and   ys
>
>                              where  ys    =  map  f  xs
>
>  - For some function f() test whether all values f(item) on a sequence
> are in increasing order.
>
>   allFIncreasing  f  xs   =  foldl   and   ys
>
>              where   ys   =  zipWith  (<)  (init  xs)  (tail  xs)
>
>
> In the last solution we used the init() function, which is the dual of
> tail() - from a list xs it produces another, containing all
> elements of
> the first list , but the last one:
>
>  init (x:xs)       = x : init xs
>
>
>
> Finally, heres an example how to keep a language small and simple:
>
> Whenever a programmer needs a mapping operator, she could produce it
> immediately herself, without having to ask a working group for
> including it in the standard language, as follows:
>
> 1. She defines the function map():
>
>  map  f   xs  =  foldl  ( (:) . f  )   [ ]  xs
>
> 2. Because she needs to apply the mapping operator repeatedly, for
> convenience she defines a multiMap() function:
>
>  multiMap   xs   fs    =    foldr   map   xs   fs
>
>  multiMap   [1, 2, 3]   [(1+), (2*), (5+)]
>
> produces:
>
>  [13,15,17]   ,  that is  [(1 + 5)*2 + 1, (2 + 5)*2 + 1, (3 +
> 5)*2 + 1]
>
> 3. The multiMap() function as defined above applies the functions
> starting from the last one in the list. The programmer wants them
> applied starting from the first function in the list. She re-defines
> multiMap  by changing foldr to foldl as follows:
>
>   multiMap   xs   fs    =    foldl   (flip  map)   xs   fs
>
>
> Now
>     multiMap   [1, 2, 3]   [(1+),  (2*),  (5+)]
>
> produces:
>
>     [9, 11, 13]
>
>  that is:  [(1 + 1)*2 + 5, (2 + 1)*2 + 5, (3 + 1)*2 + 5]
>
> 4. She is ready to use the new function. For example, she specifies a
> series of SVG coordinate transformations:
>
>   multiMap  ($coordinates,
>                     ((. *2),
>                      (if (position() mod 2) then . + 50 else .)
>                       ..
>                     )
>              )
>
>
>
> III. Conclusion
> ---------------
>
>  XPath 2.0 as specified in the current working draft has the
> problems described in Part I. A language with support for higher-order
> function is free of these problems.
>
> IV. Recommendation
> ------------------
>
>  Based on the above conclusion, I recommend that higher-order
> functions support be implemented in Xpath 2.0.
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Send FREE video emails in Yahoo! Mail!
> http://promo.yahoo.com/videomail/
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
Current Thread
Higher-Order Functions in XPath 2.0 Dimitre Novatchev - Wed, 16 Jan 2002 05:05:05 -0500 (EST) Michael Kay - Wed, 16 Jan 2002 10:26:53 -0500 (EST) <= Jeni Tennison - Wed, 16 Jan 2002 11:40:51 -0500 (EST) Thomas B. Passin - Wed, 16 Jan 2002 12:33:35 -0500 (EST) <Possible follow-ups> Joerg Pietschmann - Thu, 17 Jan 2002 10:25:17 -0500 (EST) Michael Kay - Thu, 17 Jan 2002 11:13:29 -0500 (EST)
<- Previous	Index	Next ->
Higher-Order Functions in XPa, Dimitre Novatchev	Thread	Re: Higher-Order Functions in, Jeni Tennison
RE: Not a nested for loop., Michael Kay	Date	RE: Not a nested for loop., DPawson
	Month
PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format
RSS 2.0
Atom 0.3
XML Editor - Download a 15 Day Free Trial Now >
See What's New in Stylus Studio >
Buy Stylus Studio - XML Editor - Now >