[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: RE: XML parsers use what computational power?

  • From: Bjoern Hoehrmann <derhoermi@gmx.net>
  • To: liam@w3.org
  • Date: Thu, 11 Apr 2013 02:54:27 +0200

Re:  RE: XML parsers use what computational power?
* Liam R E Quin wrote:
>On Wed, 2013-04-10 at 09:43 +0000, Costello, Roger L. wrote:
>> Resolving these entities:
>> 
>> <!ENTITY ha1 "ha">
>> <!ENTITY ha2 "&ha1;&ha1;"> 
>> ...
>> <!ENTITY ha128 "&ha127;&ha127;">
>> 
>> requires an amount of memory that is exponential to the number of entities.
>> 
>> Therefore an XML parser requires more computational power than a linear bounded automata [1].
>
>That does not follow at all.

The problem here is that an "XML parser" is not required to actually
produce any output on its own, it is assumed that there is some higher
level application it is interacting with. The problem above occurs if,
for instance, the machine writes out the input document with all the
entities expanded onto its "tape", in other words, if it has to "call"
some function of the higher level application and pass in the value of
an attribute with all entities expanded as a "type position" and then
the higher level application takes over.

That is a common API, and it seems that this scenario does have the
problem above, the "parser" is in need of more tape than what can be
described as a linear function in terms of the input length, but that's
an implementation choice; it would be fine if the "parser" component
reports only small bits to the higher level application. Generally, it
may be best to study the "complexity" of XML in terms of what is needed
to determine whether an input string is a "well-formed XML document", 
under the assumption that the input string is encoded using a character
encoding supported by the XML processor and the encoding isn't any more
complex than the processor itself.

>My understanding is that XML is at worst LL(2), not LL(n), in parser
>complexity. So it can be parsed by a deterministic finite state
>automaton, as described e.g. in the Dragon Book, the standard text for
>such things (Aho, Hopcraft & Ullman). The 2 is because you may need two
>tokens of look-ahead before you can reduce, in a shift-reduce parser,
>although that's a supposition on my part.

The EBNF grammar in the XML 1.0 Recommendation describes a context-free
language, but the set of all XML 1.0 documents is not context-free, and
since LL parsers recognize only a subset of the context-free languages,
they cannot decide whether any given string is an XML document. Reasons
for XML being not context free include many of the "WFCs" in the speci-
fication (not all of them, e.g. numeric character references have to
refer to a `Char`, and that is a regular problem not encoded in the EBNF
grammar for reasons of readability among others); one example is that
`<x a='' a='' />` is not an XML document, and you cannot express that
attribute names must be unique in a context-free grammar, so XML is not
LL.

A simple way to look at this is thinking about how to make a generator
for random well-formed XML 1.0 documents. If XML was context-free, you
could write a grammar for XML 1.0 documents of the form

  Nx ::= (Nx or Tx)*
  ...

where `Nx` is any non-terminal symbol (like `document` in the EBNF) and
`Tx` is essentially a Unicode character, and `*` means "zero or more".
You would construct a random document from that by picking `document` as
start symbol, and recursively replace the right side by a random choice
among the options given (for `or` you take one or the other, for `*` you
might take zero or one or more). What you have already generated never
factors into that, so you will end up generating duplicate attributes.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.