[XSLLIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] HigherOrder Functions in XPath 2.0
Hi, Bellow is a draft proposal for implementing higherorder functions in XPath 2.0. I'd greatly appreciate receiving any comments about the correctness and validity of this text, as well as for possible improvements. Cheers, Dimitre Novatchev.  HigherOrder Functions in XPath 2.0 Contents  Part I. Problems with XPath 2.0 1. Using the "for" expression for incremental processing 2. Difficulties in returning aggregate (nodes) value 3. Combining two sequences in producing a sequence 4. Text processing beyond the limits of regular expressions 5. XPath language complexity has grown considerably 6. Inflexibility, where equality and comparisonreturning functions are needed 7. Little or no reusability is possible for "for" expressions Part II. HigherOrder Functions Solutions III. Conclusion IV. Recommendation Part I. Problems with XPath 2.0  A general problem with XPath 2.0, as specified by the current working draft, is that some tasks exist, which cannot be solved within the language, and another group of tasks can be accomplished in XPath 2.0, but in a rather inefficient way. In all of these cases programmers can solve the task by using recursive XSLT templates and this is a powerful method, but at the same writing and testing different recursive templates for every particular case is a timeconsuming and errorprone process and often the result has low reusability. Listed bellow are some examples of such problems: 1. Using the "for" expression for incremental processing The "for" expression in XPath 2.0 cannot be used when the results of processing every item of a sequence depend on the processing of the previous items. Or in some cases it can be used, but in an obviously inefficient way:  Find the product of all numbers in a sequence.  Reverse a sequence  Concat all string items of a sequence.  Reverse a string or a sequence of items. A typical usecase, for which the XPath 2.0 solution is difficult and inefficient, is the following: Given a sequence of "book" nodes, calculate and display the amount of money received from the sales of each book (price * sales), but also obtain and display a running total, as each book node from the sequence is processed. To achieve this in XPath, one would write: for $i in (1 to count($items)) return ($items[$i], sum( for $j in (sublist($items, 1, $i)) return (@price * @sales) ) In the above expression, if N is the number of items in $items, sum() is performed N * (N + 1) / 2 times. While the above may seem to be just a textbook example (and really can be found in Mike Kay?s book), there are realworld examples, where a running total must be calculated and even several results must be accumulated in parallel. I am deeply obliged to Mark Nahabedian (naha@xxxxxxxxxx ) for allowing me to quote his work which has to deal with exactly this problem  a complete example can be found at: http://www.ai.mit.edu/people/naha/itrack/about.html 2. Difficulties in returning aggregate (nodes) value from a sequence, especially when returning those nodes depends in a nontrivial way on the other nodes of the sequence:  Obtain all nodes with "maximum value" from a sequence, especially in the case when the node "value" is computed by a very complex expression.  Obtain the nodes with "distinct values" from a sequence, especially in the case when the node "value" is computed by a very complex way.  There's no general way to "filter" elements of a sequence based on a predicate. The reason is that a predicate (function) cannot be passed as a parameter to a general "filter" function. As a result programmers will write multitude of similar filtering expressions, without being able to reuse them. Such repetitions are timeconsuming, errorprone, and generally result in unmaintainable and nonreusable code. Examples of problems in this group:  Return the sum of squares of the numbers in a sequence.  Return all items in a sequence, for which f(item) has minimal value  For some function f() test whether all the values of f(item) on a sequence are equal (> 0, etc.)  For some function f() test whether all values f(item) on a sequence are in increasing order. Although a solution can be found in XPath, it will be difficult and inefficient. Also, for every different function f() another version of the same solution will have to be produced, because functions cannot be passed as parameters. 3. Combining two sequences in producing a sequence: Given (a1, a2, ..., aN) and (b1, b2, ..., bN) compute: (a1 + b1, a2 + b2, ..., aN + bN) (a1 * b1, a2 * b2, ..., aN * bN) (a1 and b1, a2 and b2, ..., aN and bN) (a1 or b1, a2 or b2, ..., aN or bN) etc. 4. Text processing beyond the limits of regular expressions is not possible. A real world problem was pointed out by David Carlisle  he needs in his work to match strings, surrounded by (unknown in advance number of) balanced parenthesis. For any such problem, it would be nice to have a general, tabledriven parser() function. However this is not possible, because the parser() function will need to be passed as parameter a lex() function that it must call for obtaining the terminal symbols from the input text. 5. XPath language complexity has grown considerably and the language cannot continue to expand indefinitely: Already there are hardly any spare characters left for operators. Often there are (more than one) different ways of performing similar tasks. In contrast, a language that supports higherorder functions can be kept simple, small and elegant, while at the same time providing powerful means to produce any necessary new functionality. Thus the "standard" language features (e.g. operators and functions) can be kept to a minimum, while the language makes possible desired new functionality to be easily produced and accumulated into a library of general and reusable useful functions. Without support for higherorder functions such libraries are very limited in scope and usefulness. 6. Inflexibility, where equality and comparisonreturning functions are needed to be passed to:  sort  distinctvalues  grouping, etc. 7. Little or no reusability is possible for "for" expressions In the expression bellow: for $i in (1 to count($items)) return expression expression cannot be reused (simply copied and pasted) and will have to be modified every time it is used with differently named range variable so that references to the range variable are renamed. In contrast, with higherorder functions support one can have a map() function, so that in map(f, $sequence) the code of f() will never have to be modified. Part II. HigherOrder Functions Solutions  Provided higherorder functions were available, the problems listed above have easy and natural solutions. To demonstrate the compactness and high degree of readability, the code bellow is written in Haskell. Haskell is used only for convenience, in no way should it be inferred that the same syntax is recommended for XPath 2.0. Some basic conventions from this language: f x y = x * y This defines a function f(x,y) = x * y [1, 2, 3] This is a list of elements 1, 2, and 3. The same (for our purposes) as (1, 2, 3) in XPath 2.0. [] This denotes the empty list  the same as () in XPath 2.0. x  denotes the name of a single element/function. xs  any name ending in 's' denotes a list of elements. Any operator can be used also as a function, when put in brackets. Thus: (+) 1 2 = 1 + 2 = 3 The (:) operator is used to prepend an element to the start of a list: x : xs defines a list with head x and tail xs. The flip() function takes as argument any function with two arguments, and produces as result a the same function, which takes these two arguments in reverse order. flip f x y = f y x Primitive recursion over a list can be defined as follows: foldl f z [] = z foldl f z (x:xs) = foldl f (f z x) xs The function "foldl" takes 3 arguments  a function f(), which takes 2 arguments, a value z, and a list. This is one of the most general functions over lists. It traverses the list from left to write, applying f() on each element and the currently accumulated result. There is a dual function (foldr), which behaves in a similar way, but traverses the list from right to left: foldr f z [] = z foldr f z (x:xs) = f x (foldr f z xs) As can be easily seen: foldl (+) 0 xs is the sum of all elements in a list xs. Therefore we could write: sum xs = foldl (+) 0 xs Analogously: product xs = foldl (*) 1 xs And this one liner is the solution to one of the problems in Part I, section 1. We can ommit the last operand(s) from an equation, in case it is the same and we still get a valid function definition. Therefore, the above function definitions could be simplified even further and written as: sum = foldl (+) 0 product = foldl (*) 1 Reversing the elements of a list (solution of another problem in Part I, section 1.) is simply defined as: reverse = foldl (flip (:)) [] Concatenating all elements of a list (solution of the next problem) is simply: concat = foldr (++) [] where (++) is the concatenation operator for lists. Combining two lists with equal length into one can be performed using the zipWith() function: zipWith f (a:as) (b:bs) = f a b : zipWith f as bs zipWith _ _ _ = [] The function f() is applied on every pair of elements at position N from the two lists, and the result forms the element at position N in the result list. ZipWith() solves directly all the problems from Part I, section 3:  (a1 + b1, a2 + b2, ..., aN + bN) is just: zipWith (+) as bs  (a1 * b1, a2 * b2, ..., aN * bN) is just: zipWith (*) as bs  (a1 and b1, a2 and b2, ..., aN and bN) is just: zipWith and as bs  (a1 or b1, a2 or b2, ..., aN or bN) is just: zipWith or as bs A very useful function is scanl: scanl f q xs = q : (case xs of [] > [] x:xs > scanl f (f q x) xs) It is similar to foldl, but creates a list, every element of which contains the intermediate accumulated result. The first element of the resultlist is q. In case the list is guaranteed to be nonempty, then the following function can be defined: scanl1 f (x:xs) = scanl f x xs It behaves like scanl(), but doesn?t use a ?zero? argument. As can be easily seen, scanl1 `op` xs produces a list of the intermediate accumulated results of performing op() on the list xs. For example: scanl1 (+) [1, 2, 3] = [1, 3, 6] = [1, 1+2, 3 + 3] scanl1() combined with zipWith() solve directly the problem of calculating the running total from Part I, section 1: scanl1 (+) (zipWith (*) [1,2,3] [2,2,2]) returns: [2, 6, 12] A direct solution to the filtering problems defined in Part I, section 2, is provided by the function filter(): filter p xs = [ x  x < xs, p x ] it takes a function p() defined on the type of elements of its second argument  a list xs, and returns a list of those elements of xs, for which p(x) = true. Using it, we can write:  Return all items in a sequence, for which f(item) has minimal value minvals f xs = filter (= fmin) ys where fmin = minimum ys, ys = map f xs  For some function f() test whether all the values of f(item) on a sequence are equal (> 0, etc.) allFPositive f xs = foldl and ys where ys = map f xs  For some function f() test whether all values f(item) on a sequence are in increasing order. allFIncreasing f xs = foldl and ys where ys = zipWith (<) (init xs) (tail xs) In the last solution we used the init() function, which is the dual of tail()  from a list xs it produces another, containing all elements of the first list , but the last one: init (x:xs) = x : init xs Finally, here?s an example how to keep a language small and simple: Whenever a programmer needs a ?mapping operator?, she could produce it immediately herself, without having to ask a working group for including it in the ?standard language?, as follows: 1. She defines the function map(): map f xs = foldl ( (:) . f ) [ ] xs 2. Because she needs to apply the ?mapping operator? repeatedly, for convenience she defines a multiMap() function: multiMap xs fs = foldr map xs fs multiMap [1, 2, 3] [(1+), (2*), (5+)] produces: [13,15,17] , that is [(1 + 5)*2 + 1, (2 + 5)*2 + 1, (3 + 5)*2 + 1] 3. The multiMap() function as defined above applies the functions starting from the last one in the list. The programmer wants them applied starting from the first function in the list. She redefines multiMap by changing ?foldr? to ?foldl? as follows: multiMap xs fs = foldl (flip map) xs fs Now multiMap [1, 2, 3] [(1+), (2*), (5+)] produces: [9, 11, 13] that is: [(1 + 1)*2 + 5, (2 + 1)*2 + 5, (3 + 1)*2 + 5] 4. She is ready to use the new function. For example, she specifies a series of SVG coordinate transformations: multiMap ($coordinates, ((. *2), (if (position() mod 2) then . + 50 else .) ????????????.. ) ) III. Conclusion  XPath 2.0 as specified in the current working draft has the problems described in Part I. A language with support for higherorder function is free of these problems. IV. Recommendation  Based on the above conclusion, I recommend that higherorder functions support be implemented in Xpath 2.0. __________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/ XSLList info and archive: http://www.mulberrytech.com/xsl/xsllist

PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our awardwinning XML IDE  Download a free trial today! Subscribe in XML format
