Re: XML syntax (was Re: external subset syntax)
[James Anderson] > my problem is, whenever i come to a point in the proposed > recommendation at which a parser is required to report an error and > "must not continue normal processing" even though the result which > the stream would denote would be sufficiently unambiguous if > allowed, then i feel compelled to ask, "why does one have to exclude > this"? which does not mean "in which production does the standard > exclude or prescribe it", but rather why does the standard exclude > or prescribe it. what is the useful purpose? particularly when > excluding it makes the parser more complex and the document encoding > more exacting. I am not particularly fond of this rule. However, I can explain its justification. The WG made this decision at the request of both Microsoft and Netscape. In the HTML arena, both companies spend a fair amount of their time reverse engineering the other's error- recovery behavior, since Web page authors "validate" by seeing if it looks OK in their browser of choice. By requiring parsers to fail on non-conformant documents, there is no chance that a user can think erroneous data is acceptable in a conforming browser; if a browser accepts the data, its opponent can level the charge that it is non- conforming. > more than likely, when i've followed discussions of similar > questions, the design goal #3 gets hoisted like a commandment: "XML > shall be compatible with SGML". as a npw i tend to adhere more to > #'s 1,4, 6, and 9: it should be easy to generate, easy to program, > and easy to read. SGML processors are already pretty complex, so an > argument to increase the complexity of XML in strictly order to keep > SGML processors simpler is difficult to accept on logical terms. (i > know i'm being naive here, and i'm ignoring the past, but i would > wager that the future is going to bear me out...) Rule 3 is critical for two reasons: (a) technologically, it allows easier application of existing SGML technology to the new problem space, and (b) politically, it encourages XML's adoption in rigorously standards-based arenas, like the Military-Industrial Complex. > the simplest thing would have been a document form which > distinguished inline definitions, external references (ie XLL > built-in), content, and (maybe) a declaration (autorecognition of > encoding being the criteria on the latter). it is true, that that is > all there, but the standard requires at least twice as many > syntactic forms as are necessary. so despite having read mr > murray-rust's note on background to the list itself (re: XML-DEV > (was Re: YAXPAPI)) which gave me some sense of the effort which has > gone into the proposed recommendation, the distance between the > simple form of the denoted data and the complexity of the syntactic > form often leads me to ask "why?" Many people have had discussions of the form "a markup language might ...", in which a clean, new theoretical language is designed. These discussions are useful and interesting, but completely outside of the scope of XML, whose charter was to enable the transfer of SGML over the Web. If you want to design such a language, and are successful in encouraging its adoption, many current SGMLheads would be very grateful. We use SGML because it is the best existing tool, not because it is the best possible. > (as an aside, i didn't - and still don't - see that as, in itself, a > sufficient explanation, since the case would comprise two instances > of a "document type declaration": one in the xml document and the > other in the prolog of the external portion of the "document type > definition", which was referred to from the first, but is not > contained in the first, and which serves to constrain the root > element <em>if<em> so desired.) And indeed, some older SGML software produces documents like this. This is a purely backwards-compatibility issue, from one point of view; disambiguation rules could easily be developed, but then that language would not be SGML. See the XML charter. > another example is the MDC (']]>') exclusion in CharData which means > that one needs a state machine to scan character data. why? This is because floating msc/mdc combos can get you later in a big way. See _The SGML FAQ Book_, and trust us on this. I'd recommend avoiding marked sections in the document instance altogether, but if you don't, *ALWAYS* escape any occurrence of ']]>' in data. > another example is that of , in itself, where the npw believes > his point (in a previous posting) was misunderstood, and can only > repeat the question <em>why</em> is a PI-close specified to be '?>' > and not '>', which would be easier, or ('?>' | '>'), which would be > robuster and observes (wrt to 'XML' itself) that the standard, cf #6 > with irony, engenders an encoding where of the four obvious humanly > legible encodings (that is, neglecting 'xMl' et.al.: ('<?XML' | > '<?xml') ... ('?>' | '>')) only one is legitimized. why? if the > precision of an encoding depends so much on uniqueness, then why > does one start out with such a level of lexical complexity in the > first place, only to then exclude much of it as 'malformed'? all you > need is <, >, ', & and / (if you allow element recursion) - and even > the distinction between < and > is more for the eye than anything > else. The pic *was* '>' in SGML. It was explicitly changed to '?>' for two reasons. One, there is no standardized way of escaping characters in a PI, so with pic='>' there's no way to put a greater-than in a processing instruction. '<?JScript if (1>2)>' is illegal. Yes, you can use application conventions, but are authors going to buy '<?JScript if (1>2)>'? So, since '?>' is much less likely to occur *within* PIs, it makes a safer delimiter. Secondly, the symmetry is appealing, especially for new authors. Have you never seen <!-- --!> used as a comment on Web pages? The <? ... ?> syntax is more intuitive. Take the time to search the SGML WG archives (<URL:http://lists.w3.org/Archives/Public/w3c-sgml-wg>), which go through July of this year and are open to the public, and the XML SIG archives (address unknown). Searching them will lead to answers to many of these questions. See also the XML FAQ at <URL:http://www.ucc.ie/xml/>. -Chris -- <!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN"> <!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN" "<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487 <USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i... Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@i... the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@i... the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@i...)
PURCHASE STYLUS STUDIO ONLINE TODAY!
Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!
Download The World's Best XML IDE!
Accelerate XML development with our award-winning XML IDE - Download a free trial today!
Subscribe in XML format