[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Napkin grammar
In case anyone is interested, I made a little grammar up to show the kind of thing that I was thinking of as a start point not an end poit, based on recent posts. Maybe having something concrete helps.
So it is two parts:
This uses some extensions: == means "if" --> $something means a data type conversion -> means a substitution (handling references) . means a look-up in the lexical context, just a shorthand. GRAMMAR: document = (element | comment | pi )+ element = start-tag ( CHARACTER+ | element | comment | pi)* end-tag start-tag = name attribute* EOM name = START-TAG.TOKEN attribute = attname ( typeable-token | ATTRIBUTE-TEXT) attname = TOKEN typeable-token = boolean | year | | symbol boolean = TOKEN == ("true" | "false" ) --> $boolean --> $yearDate number
= TOKEN --> $integer or $decimal symbol = TOKEN end-tag = END-TAG.TOKEN EOM comment = COMMENT-TAG.CHARACTER* EOM pi = piname CHAR* EOM piname = PI-TAG.TOKEN E)M Each lexical pass can be thread-parallelized by section. And the pass execution can be a parallelized by e.g. queuing the results of one thread into another as needed. And the recognition can be parallelized using SIMD. LEXICAL PASS 1: TAG DEMARCATION TEXT = ws* ("<" MARKUP EOM==">" DATA? )+ Note: A terminating "data" section should be marked as ws. Note: EOM is the only delimiter signal the lexer needs to provide up, but it is only actually needed for start-tags, and would not be part of an infoset.
LEXICAL PASS 2: ATTRIBUTE DEMARCATION MARKUP = ((?=[^!/?]) START-TAG | COMPLEX-TAG START-TAG = (TAG-TEXT \" ATTRIBUTE-TAG \"? ) + Note: apos not supported as attribute delimiter here.
LEXICAL PASS 3: REFERENCE SUBSTITUTION ( DATA | ATTRIBUTE-TEXT | SIMPLE-TAG | COMPLEX-TAG ) -> (CHARACTER | NUMERIC-CHARACTER-REFERENCE -> CHARACTER | ENTITY-REFERENCE -> CHARACTER+)* Note: numeric character reference is hex numeric character reference to unicode number a la XML. No decimal reference. I didnt bother to put the production in, but it looks for &. Note:
LEXICAL PASS 4: TOKENIZATION TAG-TEXT = ( ws | "=" | TOKEN )+
COMPLEX-TAG = END-TAG | COMMENT-TAG | PI-TAG PI-TAG = "?" TOKEN ws* CHARACTER* "?" END-TAG = "/" TOKEN ws*
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|