PRODUCTS

DOWNLOAD

BUY

LEARN

SUPPORT

COMPANY

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Napkin grammar

From: Rick Jelliffe <rjelliffe@allette.com.au>
To: xml-dev <xml-dev@lists.xml.org>
Date: Thu, 22 Jul 2021 20:06:01 +1000

Play the video

In case anyone is interested, I made a little grammar up to show the kind of thing that I was thinking of as a start point not an end poit, based on recent posts. Maybe having something concrete helps.

So it is two parts:

First, a grammar which not made with parallel parsing considerations particularly in mind. The capitalized names in the grammar are the non-terminals determined by the lexical processing. (The sub-rules for recognizing the types of undelimited data values are given in the grammar not the lexer, which I think is easiest to read if unfamiliar.)
Second, the lexical processing is specified as given as a series of logical passes. Each pass is amenable to be divided and run in a parallel fashion or as a pipeline or some event system or folded into the grammar; of course a real implementation of them might coalesce them or rearrange with the same intent.

This uses some extensions:
== means "if"

--> $something means a data type conversion

-> means a substitution (handling references)

. means a look-up in the lexical context, just a shorthand.

GRAMMAR:

document = (element | comment | pi )+

element = start-tag ( CHARACTER+ | element | comment | pi)* end-tag

start-tag = name attribute* EOM

name = START-TAG.TOKEN

attribute = attname ( typeable-token | ATTRIBUTE-TEXT)

attname = TOKEN

typeable-token = boolean | year | | symbol

boolean = TOKEN

== ("true" | "false" )

--> $boolean
year = TOKEN
== ( DECIMAL+ "-" CHARACTER* )

--> $yearDate

number = TOKEN
== (""-")? DECIMAL+ ("." CHARACTER+)?

--> $integer or $decimal

symbol = TOKEN

end-tag = END-TAG.TOKEN EOM

comment = COMMENT-TAG.CHARACTER* EOM

pi = piname CHAR* EOM

piname = PI-TAG.TOKEN E)M

Each lexical pass can be thread-parallelized by section. And the pass execution can be a parallelized by e.g. queuing the results of one thread into another as needed. And the recognition can be parallelized using SIMD.

LEXICAL PASS 1: TAG DEMARCATION

TEXT = ws* ("<" MARKUP EOM==">" DATA? )+

Note: A terminating "data" section should be marked as ws.

Note: EOM is the only delimiter signal the lexer needs to provide up, but it is only actually needed for start-tags, and would not be part of an infoset.

LEXICAL PASS 2: ATTRIBUTE DEMARCATION

MARKUP = ((?=[^!/?]) START-TAG | COMPLEX-TAG

START-TAG = (TAG-TEXT \" ATTRIBUTE-TAG \"? ) +

Note: apos not supported as attribute delimiter here.

LEXICAL PASS 3: REFERENCE SUBSTITUTION

( DATA | ATTRIBUTE-TEXT | SIMPLE-TAG | COMPLEX-TAG )

-> (CHARACTER

| NUMERIC-CHARACTER-REFERENCE -> CHARACTER

| ENTITY-REFERENCE -> CHARACTER+)*

Note: numeric character reference is hex numeric character reference to unicode number a la XML. No decimal reference. I didnt bother to put the production in, but it looks for &.

Note:

I didn't bother to put the reference production: just & is start. Lazy.
Hex NCR only?
Entity reference is to all ISO/SGML/W3C/MathML entities with W3C (MathML) mappings. Implementation can override, good for some publishers?
In SGML terms, all entities are CDATA: No markup or references allowed in entity references, and must not expand to more characters than reference.
There is one MathML character that needs bold tagging: if used, it must be explicitly put into bold by tags, the bold cannot transport.

LEXICAL PASS 4: TOKENIZATION

TAG-TEXT = ( ws | "=" | TOKEN )+

COMPLEX-TAG = END-TAG | COMMENT-TAG | PI-TAG
COMMENT-TAG = "!--" CHARACTER* "--"

PI-TAG = "?" TOKEN ws* CHARACTER* "?"

END-TAG = "/" TOKEN ws*

Follow-Ups:
- Re: Napkin grammar
  - From: Rick Jelliffe <rjelliffe@allette.com.au>
- Re: Napkin grammar
  - From: John Cowan <johnwcowan@gmail.com>
- Re: Napkin grammar
  - From: Tim Bray <tbray@textuality.com>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Subscribe in XML format

RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >