[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Understanding why <tag></tag> is the way it is (wa

Subject: Re: Understanding why <tag></tag> is the way it is (was Re: IE Client side transformation issue)
From: Norman Gray <norman@xxxxxxxxxxxxxxx>
Date: Fri, 3 Aug 2007 12:51:01 +0100
Re:  Understanding why <tag></tag> is the way it is (wa
Greetings.

On Thu, 02 Aug 2007 19:14:32 -0600, Abel Braaksma <abel.online@xxxxxxxxx> wrote:

About the spec thing, isn't it something from SGML heritage? I mean, didn't XML introduce the shortcut <br /> for <br></br> thus disallowing the SGML <br> on itself (without closing tag)? And wasn't it also SGML heritage that allowed <option selected> and XML forced more strict rules and made it <option selected="selected">?

SGML was designed at a time, and in a context, which assumed that document markup would be entered by hand; it therefore included a large number of short forms, and ways to minimise typing.


These included omitting redundant endtags (so that in an HTML-like DTD, "<h1>title<p>para1<p>para2" would be OK, since an h2 element can't contain a p, and a p can't contain another one, so that the presence of the implicit end-tags, closing the h1 and p elements, could be inferred). There were various attribute-defaulting and tag- minimising tricks as well, so that <p>text</> was valid, with the </> construct closing the most recently opened tag. And so on and so on.

The even cleverer thing about SGML (and one of the various things that made it complicated to write an SGML system) was that the syntax of the SGML lexer was specifiable on the fly. Starting tags with the '<' character, starting the end-tag with '</', having quotes marked with '"', using the ASCII character set, using letters as element names, were the default, but were all optional.

That brought about the "NET-hack". You could specify that the null- end-tag (NET) start string was '/' rather than '</', thus bringing about the sequence of transformations

1. <p></p> (fully normalised form)
2. -> <p</p> (you didn't have to close tags if you were starting a new one immediately)
3. -> <p</> (use the null end tag </> to close the most recently started element)
4. -> <p/> (if you had redefined the NET string from '</' to '/').


...and <p/> was deemed to look adequately pretty (I might be misremembering this slightly, but it was something very like that).

Although it didn't end up specified quite like that, XML was initially viewed (by some) as a specific set of settings for the SGML lexer, which turned off all the options and minimisations. Because the end result had no contractions and no options, it was massively easier to write parsers for. That is, XML is SGML-- (ahem!).



Pace Andrew Welch, HTML usually isn't parsed with an `SGML parser', but with a special-purpose never-fail make-it-up-when-necessary HTML- specific parser. John Cowan's tagsoup parser is one of a couple of SAX parsers which will accept HTML tag soup and always emit a valid SAX stream.

Andrew remarks:

My point was that if it made no difference to the XML parser (but a
big difference in the Real World) then why not?

Ian Hickson makes some relevant remarks at <http://www.hixie.ch/ advocacy/xhtml> suggesting, in some detail, that sending out XHTML with a text/html content type can potentially cause you problems.


The other _really_ good thing about SGML was that it had DSSSL as a transformation language. By the look of things, XSL2 is expanding towards being a small subset of DSSSL (but that's another hobbyhorse). DSSSL can of course be used to process XML, but it's a bit of a minority interest, these days.

All the best,

Norman
[drifting down memory lane]


-- ------------------------------------------------------------ Norman Gray : http://nxg.me.uk eurovotech.org : University of Leicester, UK

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.