[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: SGML the next big thing?

  • From: "Liam R. E. Quin" <liamquin@i...>
  • To: "'xml-dev@i...'" <xml-dev@i...>
  • Date: Fri, 3 Dec 1999 23:23:28 -0500 (EST)

free sgml parser
On Fri, 3 Dec 1999, Lauren Wood wrote:
> On 3 Dec 99, at 12:14, Arnold, Curt wrote:
>> It looks like the XML Schema group is trying to add back the & construct.
>> If you have a compelling justification for continued suppression, please
>> rant long and loud.
> 
> How about every SGML parser author I've talked to says the & 
> construct was the biggest, hardest part (which means probably the 
> buggiest) of the entire parser? I think the XML WG was right in 
> throwing it out of XML in the first place.

If this is as per content models, I think
(1) Lauren is right, because as SGML specified them, they were very
    hard to get right.

    This & thing is so far outside the way most other computer languages
    work that standard off-the-shelf parser generators roll on their
    backs and wave their paws in the air and admit defeat.

(2) The idea of saying, "this element must contain at least one of each of
    the following elements" is a useful one, and is very different from
    the & construct.

    A simplified, regularised form of & might be possible.

(3) The & connector interacts with #PCDATA to form pernicious content
    models (see below).  The XML WG went to great lengths to make sure
    that no valid XML document suffers from this SGML bogosity.  Similar
    lengths are needed for "&".

Note:
    For those who're not familiar with &, the content model connector in
    SGML that says that in order to match a & b & c ..., every content
    fragment a, b, etc., must be satsfied, and nothing must be left over.
    Furthermore, there must be exactly one way to satisfy the expression,
    as otherwise it is "ambigious" and illegal, just as
	(a, b?) | a
    is illegal in SGML, even though it is a perfectly sensible and valid
    regular expression for the rest of the world of computing :-)



    Consider the following SGML declaration (with OMITTAG NO):
	<!ELEMENT boy
	    (noise & (dirt,mud)+ & (mud,shoes,trouble)* & #PCDATA) +smell
	>
    This is a "pernicious" mixed content model, and can only have
    white space in it between elements once, since that uses up the
    #PCDATA content model fragment.

    The following is (let's say for the sake of argument) a valid boy:
	mud,smell,shoes,trouble,dirt,mud,dirt,mud,noise,smell

    If you try and match this against the content model I gave, you'll
    see that you can't do it with LL(1) or LALR(1) directly unless
    you build a DFA with a rather large number of states.  I added the
    inclusion +smell, but you could change the content model to be
	(boy-model | smell)*
    to have an even more interesting time of it.



-- 
Liam Quin, Barefoot Computing, Toronto;  The barefoot agitator
l i a m    at    h o l o w e b    dot    n e t <-- NEW ADDRESS
Ankh on irc.sorcery.net, http://www.valinor.sorcery.net/~liam/
Please remove your shoes and socks before replying in anger.


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1
To unsubscribe, mailto:majordomo@i... the following message;
unsubscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.