[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] SGML's deterinism rules and SHORTREF (was: Re: RE: Theremarkab
Marcus Reichardt <u123724@gmail.com> writes: >> Am 26.06.2022 um 15:01 schrieb Roger L Costello <costello@mitre.org>: >> >> There was also a strange >> unnecessary constraint on these expressions called "ambiguity", which >> *everybody* who wrote SGML software needed to understand, and so the idea of >> applying formal language techniques to SGML was inevitable. > Hmm, your anonymous source should've expanded on that; without proof > or anything the claim of "unnecessary-ness" is void. > SGML has tag inference (can infer arbitrarily many start- and > end-element tags) and moreover can expand short reference delimiters > (arbitrary tokens not used a markup delimiters, including newlines and > tabs/spaces) by context-dependendant replacement text (start- and > end-elements typically). Determinism in content models helps a lot to > make this even work, ... That's an interesting account; I think I would not be the only reader of xml-dev who would be interested to see a more concrete argument that the content-model determinism (confusingly called "ambiguity") rules of SGML help at all with tag inference or with the management of SHORTREF mappings. Can you expound? I'm sure a lot of us would be glad to learn more. One reason I ask is that I have been curious for twenty or thirty years about the rationale for those rules, and none of the members of WG 8 whom I have had the opportunity to ask about it has mentioned anything to do with markup inference or short references. Now, it's entirely possible for some members of a working group to be unaware of technical arguments important for other WG members, so perhaps I have merely been unlucky in asking only people who didn't know or care about this particular piece of technical background. Another reason is that the essence of the determinism rules is that each element in the input match against an identifiable identifier in the content model. But since SHORTREF mappings are associated with element types and not with content-model tokens, I don't see a way for it to matter, for SHORTREF recognition, which token in a content model is being matched, or whether there is more than one. > ... and as I hear, the same thing is being > re-introduced as "Invisible XML", giving you a facility that was > already there in 1986 ;) I think it's fair to say that invisible XML and the SHORTREF and DATATAG (and markup inference) constructs of ISO 8879 have a certain rough similarity. But "same thing" sounds to me like an exaggeration. It has been a long time since I read 8879, but I do not remember anything like an explanation of how to take an arbitrary context-free grammar in EBNF form and formulate an equivalent set of declarations in a DTD. I don't remember any discussion of the expressive power of SHORTREF, let alone a claim that it will handle arbitrary context-free grammars. > Of course, with XML-style fully tagged markup, more relaxed models > could be used, hence RelaxNG. > What is the argument here? That a Thompson construction for finite > automata (as opposed to Glushkov/Antimirov derivatives) is more > convenient for programmers since an off-the-shelve regexp lib can be > used? No, I don't think that is the argument Roger Costello was making or reporting, nor is it one that I have heard anyone make. One argument I have heard people make, and have made myself, is that the determinism rules seem ad hoc and unconnected to the basics of automata theory. It was not until well after ISO 8879 became an international standard that there was a coherent statement of what they meant in terms of automata theory. The fact that routine techniques from automata theory could not always be applied (unless they could -- figuring out which is which is not easy, working from the text of 8879) certainly did make it harder for people to build conforming SGML processors, and certainly did affect the number of SGML processors available. The determinism rules reduce the expressive power of content model notation vis-a-vis regular expressions, so that although they look like regular expressions content models are strictly less expressive than regular expressions. Any design involves tradeoffs, but for costs like these I would want to see some substantial advantages. I first started looking for them 25 or 30 years ago, and in that time I have not yet been persuaded that they have any compensating advantages at all except that they simplify life for programmers who can't figure out how to make backtracking work and cannot be bothered to learn how to determinize a finite state automaton. The claim that the determinism rules make SHORTREF work would probably count as a coherent advantage, if it can be substantiated, though it would not at this date be a very compelling one for those who gave up on SHORTREF some time in the early 90s. -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|