[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: SGML the next big thing?
On Fri, 3 Dec 1999, Lauren Wood wrote: > On 3 Dec 99, at 12:14, Arnold, Curt wrote: >> It looks like the XML Schema group is trying to add back the & construct. >> If you have a compelling justification for continued suppression, please >> rant long and loud. > > How about every SGML parser author I've talked to says the & > construct was the biggest, hardest part (which means probably the > buggiest) of the entire parser? I think the XML WG was right in > throwing it out of XML in the first place. If this is as per content models, I think (1) Lauren is right, because as SGML specified them, they were very hard to get right. This & thing is so far outside the way most other computer languages work that standard off-the-shelf parser generators roll on their backs and wave their paws in the air and admit defeat. (2) The idea of saying, "this element must contain at least one of each of the following elements" is a useful one, and is very different from the & construct. A simplified, regularised form of & might be possible. (3) The & connector interacts with #PCDATA to form pernicious content models (see below). The XML WG went to great lengths to make sure that no valid XML document suffers from this SGML bogosity. Similar lengths are needed for "&". Note: For those who're not familiar with &, the content model connector in SGML that says that in order to match a & b & c ..., every content fragment a, b, etc., must be satsfied, and nothing must be left over. Furthermore, there must be exactly one way to satisfy the expression, as otherwise it is "ambigious" and illegal, just as (a, b?) | a is illegal in SGML, even though it is a perfectly sensible and valid regular expression for the rest of the world of computing :-) Consider the following SGML declaration (with OMITTAG NO): <!ELEMENT boy (noise & (dirt,mud)+ & (mud,shoes,trouble)* & #PCDATA) +smell > This is a "pernicious" mixed content model, and can only have white space in it between elements once, since that uses up the #PCDATA content model fragment. The following is (let's say for the sake of argument) a valid boy: mud,smell,shoes,trouble,dirt,mud,dirt,mud,noise,smell If you try and match this against the content model I gave, you'll see that you can't do it with LL(1) or LALR(1) directly unless you build a DFA with a rather large number of states. I added the inclusion +smell, but you could change the content model to be (boy-model | smell)* to have an even more interesting time of it. -- Liam Quin, Barefoot Computing, Toronto; The barefoot agitator l i a m at h o l o w e b dot n e t <-- NEW ADDRESS Ankh on irc.sorcery.net, http://www.valinor.sorcery.net/~liam/ Please remove your shoes and socks before replying in anger. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on CD-ROM/ISBN 981-02-3594-1 To unsubscribe, mailto:majordomo@i... the following message; unsubscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|