[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Parsing pipeline, flow-based programming, grammars and par

  • From: Rick Jelliffe <rjelliffe@allette.com.au>
  • To: "Costello, Roger L." <costello@mitre.org>
  • Date: Tue, 5 Nov 2013 01:09:27 +1100

Re:  Parsing pipeline

Sometimes people distinguish between a parser, whose job is to say what parts of the grammar each part of the input belongs to, and a revogniser, whose job is to say whether the input conforms to the grammar.

Firewall yea/nay validation just needs a recognizer; contrast with XSD PSVI. (Schematron is more like a lot of parallel recognizers: the @role attribute is maybe the closest it gets to producing a parse. )

(Using grammars for validation was one of Charles Goldfarb and company's great innovations. But it is not the case that the Chomsky -style grammars are the only game in town for parsing. A few years ago I had a blog speculating on whether Zelig Harris' Operator Grammars could be used for XML validation. GOOGLE Zelig Harris Schematron)

Rick

On 03/11/2013 4:20 AM, "Costello, Roger L." <costello@mitre.org> wrote:

Hi Folks,

 

                Parsing is the process of structuring a

linear representation in accordance with

a given grammar. [Grune & Jacobs]

 

It just dawned on me that a "schema validator" is actually a parser.

 

An XML Schema is a grammar. A schema validator structures input in accordance with the XML Schema. Hey, that’s a parser!

 

A schema validator takes as input the output of another parser, the XML parser. An XML parser structures input in accordance with the XML grammar.

 

So there are two parsers that run, one following another:

 

 

A parsing pipeline!

 

That’s pretty neat. And it is in-line with Flow-Based Programming (FBP). (See Sean McGrath’s recent mention of FBP)

 

I recently started reading the bible of parsing:

 

                Parsing Techniques, A Practical Guide [Grune & Jacobs]

 

Reading it has made me realize that grammars are cool, so are parsers.

 

Here is a fantastic snippet from the book:

--------------------

Parsing is the process of structuring a linear representation in accordance with a given grammar. This definition has been kept abstract on purpose to allow as wide an interpretation as possible. The “linear representation” may be a sentence, a computer program, a knitting pattern, a sequence of geological strata, a piece of music, actions of ritual behavior, in short any linear sequence in which the preceding elements in some way restrict the next element. For some of the examples the grammar is well known, for some it is an object of research, and for some our notion of a grammar is only just beginning to take shape.

 

For each grammar, there are generally an infinite number of linear representations (“sentences”) that can be structured with it. That is, a finite-sized grammar can supply structure to an infinite number of  sentences. This is the main strength of the grammar paradigm and indeed the main source of the importance of grammars: they summarize succinctly the structure of an infinite number of objects of a certain class.

 

There are several reasons to perform this structuring process called parsing. One reason derives from the fact that the obtained structure helps us to process the object further. When we know that a certain segment of a sentence is the subject, that information helps in understanding or translating the sentence. Once the structure of a document has been brought to the surface, it can be converted more easily.

 

A second reason is related to the fact that the grammar in a sense represents our understanding of the observed sentences: the better a grammar we can give for the movement of bees, the deeper our understanding of them. [Italics mine. I found this to be a fantastically profound statement.]

 

A third lies in the completion of missing information that parsers, and especially error-repairing parsers, can provide. Given a reasonable grammar of the language, an error-repairing parser can suggest possible word classes for missing or unknown works on clay tablets.

--------------------

 

This makes me want to start writing my own grammar languages and my own parsers!

 

/Roger

 

[Grune & Jacobs] http://www.amazon.com/Parsing-Techniques-Practical-Monographs-Computer/dp/1441919015/ref=sr_1_1?s=books&ie=UTF8&qid=1383331225&sr=1-1&keywords=parsing+techniques+a+practical+guide



[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.