[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Report on the elimination of SGML AND groups (fwd)
I'm sure that there are people in this list, who might find this report interesting. However, be prepared for some theoretically oriented computer science. You have been warned. ---------- Forwarded message ---------- Date: Tue, 19 May 1998 12:44:54 +0300 (EET DST) From: Pekka Kilpelainen <kilpelai@c...> To: Anne.Brueggemann-Klein@i..., dwood@c..., cmsmcq@t..., marcy@w..., ak117@f... Cc: Helena.Ahonen@c..., Barbara.Heikkinen@c..., Oskari.Heinonen@c..., Jani.Jaakkola@c..., Pekka.Kilpelainen@c..., Greger.Linden@c..., Jyrki.Niemi@c..., Kimmo.Paasiala@c... Subject: Report on the elimination of SGML AND groups Dear colleagues, FYI: I have written a report, where I analyze the possibilities and the lengtehening effect of replacing AND groups of SGML content models by equivalent XML model groups. The report is available as gnu-zipped Postscript through its abstract page at the address http://www.cs.helsinki.fi/~kilpelai/C-1998-12.html . I include at the bottom the abstract of the report. I would be thankful for any comments. Yours, Pekka Kilpelainen Pekka Kilpelainen, University of Helsinki, Dept. of Computer Science Email: Pekka.Kilpelainen@c... phone: +358 9 7084 4227, fax: +358 9 7084 4441 http://www.cs.helsinki.fi/~kilpelai --------------------------- SGML & XML Content Models Pekka Kilpeläinen University of Helsinki, Department of Computer Science Report C-1998-12, May 1998 16 pages http://www.cs.helsinki.fi/TR/C-1998-12/ The SGML and XML standards use a variation of regular expressions called content models for modeling the markup structures of document elements. SGML content models may include so called AND groups, which are excluded from XML. An AND group, which is a sequence of subexpressions separated by an &-operator, denotes the sequential catenation of its subexpressions in any possible order. If one wants to shift from SGML to XML in document production, one has to translate SGML content models to corresponding XML content models. The allowed content models in both SGML and XML are restricted by a requirement of determinism, which means that a parser recognizing document element contents has to be able to decide without lookahead, which content model token to match with the current input token, while processing the document from left to right. It is known that not all SGML content models can be expressed as an equivalent XML content model. It is also known that transforming an SGML content model into an equivalent XML content model may cause an exponential growth in the length of the content model. We discuss methods of eliminating AND groups and analyze the circumstances where they can be applied. We derive a tight bound of $e n!$ on the number of symbols in the result of eliminating an AND group of $n$ symbols, where $e = 2.71828...$is the base of natural logarithms. We present the analysis in a pedagogical manner, emphasizing mathematical methods which are typical to the analysis of algorithms. We also show that minimal deterministic automata for recognizing an AND group of $n$ distinct element names contain $2^{n}$ states and $n 2^{n-1}$ transitions, excluding the failure state and transitions leading to it. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|