Re: Are XPath expressions parsed using compiler parsin

Play the video

Subject: Re: Are XPath expressions parsed using compiler parsing techniques?
From: "John Lumley john@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 10 May 2022 10:07:17 -0000

On 08/05/2022 22:20, Roger L Costello costello@xxxxxxxxx wrote:

An XPath expression is a query. Not against a database, but against an XML document. Are XPath expressions parsed using compiler parsing algorithms? Is a syntax tree constructed for an XPath expression? Is the syntax tree traversed?

In my work with Saxonica I've implemented XPath expression parsers both by using a parser-generator and transcription/tweaking of an extant recursive descent/precedence parser originated by and described by Michael Kay earlier.

I used Gunther Rademacher's excellentB REx parser generator ( https://www.bottlecaps.de/rex/) first to generate an XPath parse tree forB streamability analysis (see https://www.balisage.net/Proceedings/vol13/html/Lumley01/BalisageVol13-Lumley01.html ) where I really just wanted a parse tree to analyse. REx made the process of building the parse tree very simple.

Later, when adding dynamic XPath support (xsl:evaluate) to SaxonJS (see https://archive.xmlprague.cz/2017/files/xmlprague-2017-proceedings.pdf#page=13 ) I again used the REx-generated parser, adding a 'compiler/analyser/code-generator' written in Javascript to generate the appropriate SaxonJS execution plan.

Eventually SaxonJS expanded to being able to support the fn:transform() function, which required the development of a complete XSLT compiler, written mostly in XSLT with a (Javascript-coded) extension function compile-XPath(). (See https://www.balisage.net/Proceedings/vol19/author-pkg/Lumley01/BalisageVol19-Lumley01.html and http://archive.xmlprague.cz/2019/files/xmlprague-2019-proceedings.pdf#page=235 )

At this stage we started again with the REx-based parser, but after getting the main compiler running rewrote the parser by transcribing the Java-based parser/compiler that Saxonica had developed over many years into a Javascript version, which was probably an order-of-magnitude more efficient than the REx-based predecessor.

The real differences in the two approaches are that:

 * Using a parser-generator can get you running much more quickly and
   able to adapt to changes in the grammar much more rapidly, but
   diagnostics at the parse stage are at most very generic ("Expecting
   'foo' at line 17, character 6, got 'bar'") and when you're running
   outside development you're running a much less time-efficient parser.
 * Writing a specific parser (assuming the grammar isn't full of
   ambiguity traps) takes much longer, but gives you much better
   control over the results and diagnostics. Error messages can be much
   better tailored ("The cardinality indicator for a sequence type must
   be one of *|+|? - provided '$') and concurrent type analysis,
   small-scale static error detection and small-scale optimisation
   (e.g. arithmetic constant expression reduction as Dimitre has shown)
   can be incorporated.
   But if the grammar changes drastically, you might have to do a lot
   of rework, dependent upon the architecture of your parser. (As an
   example adding some support for putative XPath4 operators (e.g.
   'otherwise') needed some changes to the tokenizer and additional
   cases in some of the parse function switches, but where the new
   operator fitted into the class-based parser was fairly straightforward.)

Having recently done some similar work on parsing InvisibleXML, these lessons still stand - a purpose-written parser will give much better control and performance than a generated one, at the cost of reduced 'flexibility'. But I should also finish with a caution, that most of us working in this area would echo strongly:

   *Don't build your parser without investing (heavily) in a
   test-driver and test-suite* - the more tests you can add, the
   better, even the really weird ones, which may eventually occur 'in
   nature'. (As Michael Kay hinted, 'or and not' is a valid XPath
   expression, only one of whose tokens is an operator - I think!)

--
*John Lumley* MA PhD CEng FIEE
john@xxxxxxxxxxxx

Current Thread
Are XPath expressions parsed using compiler parsing techniques? Roger L Costello costello@xxxxxxxxx - 8 May 2022 21:19:51 -0000 Dimitre Novatchev dnovatchev@xxxxxxxxx - 8 May 2022 22:00:42 -0000 Michael Kay mike@xxxxxxxxxxxx - 8 May 2022 23:10:27 -0000 Adam Retter adam.retter@xxxxxxxxxxxxxx - 10 May 2022 09:04:25 -0000 John Lumley john@xxxxxxxxxxxx - 10 May 2022 10:07:17 -0000 <=

<- Previous	Index	Next ->
Re: Are XPath expressions par, Adam Retter adam.ret	Thread	[no subject], Unknown
Re: Are XPath expressions par, Adam Retter adam.ret	Date	Re: Are XPath expressions par, Dimitre Novatchev dn
	Month

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >