[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Many different syntaxes in XML - is that good language des

  • From: Norman Gray <norman.gray@glasgow.ac.uk>
  • To: Pete Cordell <pete++xmldev@codalogic.com>
  • Date: Mon, 07 Mar 2022 16:57:16 +0000

Re:  Many different syntaxes in XML - is that good language des
Pete, hello.

On 7 Mar 2022, at 15:52, Pete Cordell wrote:

> Viewed like that it seems a fairly minimal and efficient syntax.

Indeed.  SGML is a thing of some beauty, viewed through the right (rather special) spectacles.

>  (It does make me wonder why the CDATA section 'directive' wasn't just <!CDATA[...]>.  Even more curious, given all the SGML things that got dropped, is how it got included in XML.  It creates just as many problems as it solves.)

It's certainly pretty orthogonal, in all sorts of directions.

Regarding <![CDATA[...]]> vs <!CDATA[ ... ]>, the sequence of tokens here *in SGML* is

  <!  : markup declaration open
  [    : declaration subset open
  CDATA  : status-keyword
  [    : dso again
  ...data
  ]]  : marked section close
  >  : markup declaration close (which happens to be the same character, by default, as element start-tag-close, and a few others)

I'm not 100% clear why the 'declaration subset open' is so-called.  This token is also used to introduce the DTD declaration, full or partial, at the very top of a document (which is written in the 'other' syntax, in the terms of this thread), and it seems to have been reused here _partly_ as a sort-of gesture towards the declaration language of DTDs -- ie, the first '[' is effectively signalling an escape inside the escape, in a different direction.  The SGML status keywords, alongside CDATA, were/are INCLUDE, IGNORE (which includes and ignores the text inside the construct), RCDATA (which is like CDATA except that entities (only) are recognised and expanded (have I got that right?)), and TEMP (which did nothing other than mark the contained text as temporary).  I presume the duplication of the ']' in the marked-section-close is partly to keep the brackets balanced, and partly because it's a string that's unlikely to appear in normal text.  It was possible to have whitespace either side of the status-keyword terms, so that '<![ CDATA   [...]]>' would be a legal SGML declaration.

I think that all of these except CDATA were dropped in XML, along with the different lexical classes, so that (*checks*...) the start of a CDATA section is just '<![CDATA[' as an otherwise unintelligible magic string.  Why that particular magic string and not a saner one?  Purely, I think, to retain the status of XML documents as being also parseable as SGML.  That is, SGML would lex this string differently, but react in the same way.

The other gasp-worthy thing about SGML was that all of these lexical items, such as '<', '<!', and so on and very much on, were configurable, so you could prefix your document with declarations (in the 'other' syntax) which changed these, and have different character sequences open and close start-tags, processing instructions, and so on.  The angle brackets and ampersands we're familiar with are just the SGML defaults.

Enough (slightly deranged) nostalgia!

Best wishes,

Norman


-- 
Norman Gray  :  https://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.