[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: XML 2 so far
On Sun, 2010-12-12 at 18:58 -0800, Henri Sivonen wrote: > On Dec 12, 2010, at 17:42, Liam R E Quin wrote: [...] > I guess I should comment to make it controversial: Thanks! Actually I happen to agree with you but was trying to be neutral in that list (I know I didn't entirely manage it)... An alternative I'll note for the next version of the list would be to be clear that it's an error to have whitespace before the xml declaration. > [...] > I'd go in the other direction and consider the possibility of having > an arbitrary number of white space characters after <?xml but before > the encoding pseudo-attribute a design flaw in XML. HTML's content-equiv header shares this problem. But maybe it's worth limiting, I do see your point. Noted, at any rate. > Moreover, I think it's bad to have a reliable magic number within a > fixed number of bytes from the start of the file, so I think it's a > flaw that <?xml isn't required and making it potentially appear at a > later offset wouldn't be an improvement. For any XML 2.0 there would have to be something at the start to distinguish it, so I think this will have to happen anyway. > > (2) character set > > require the use of utf-8, or of utf-8 and -16, and forbid others. > > Not complete consensus here. > > No one should use anything except UTF-8 over the wire. UTF-16 is a legacy encoding. > > As for "require", the big question is if you want XML 2 processors to > be able to consume existing XML 1.0 content. If yes, you can't require > stuff. If no, failure due to lack of positive network effects is > likely. This is the big question for any XML 2.0 work I think. If it's compatible, you can't change enough to make it worth while; if not, who wants it? There are always specific communities (I'd say XML5 is a good example) but it's hard to get cross-community agreement. > > (3) document type declaration - external DTD > > Remove external DTDs. > > Not complete consensus on what to do with entities. > > I say predefine all the HTML5 named character names that end with a > semicolon. (Except in XML, you wouldn't consider the trailing > semicolon part of the name.) Many XML vocabularies today come with a different set of predefined entities; the HTML ones were based on a subset of an early version of a larger SGML set. I do know people who use many more, as well as things like &publicationDay; or &productName; Another possible issue is that the names are in English. But, the XHTML and MathML list is a good starting point for discussion. > > (4) internal subset [...] > For XML5, I'd like to get rid of internal subset processing. The main > problem is that existing XML content on the Web includes SVG files > written by Adobe Illustrator, and those files not only have an > internal subset but define namespace URLs as entities there and later > use those entities in namespace declarations. (I'd be interested in > knowing who at Adobe thought this was a good idea.) I have no clue! > > The fear of getting dragged into implementing internal subset > processing is probably the main reason why I haven't written an XML5 > parser, yet. In SGML and in SGML-inspired languages, the number of > tokenizer states required for a piece of syntax is inversely > proportional to the usefulness of the piece of syntax. :-( I didn't list some of the less useful XML features that I personally would get rid of - e.g. NOTATION and NDATA entities. It's possible that an XML 2 could have a cleaner, simpler syntax for an internal subset. E.g. an xml-instance syntax, <xml version="2.0"> <head> <entity name="product">Product 3.1</entity> </head> <body>We're ready to ship &product; now!</body> </xml> where xml/head and xml/body are reserved names. I don't know. > > (5) multiple root elements > > Allow multiple root elements in a document. > > Why? Because people want it. There's no technical need. > > On the other hand, it may break existing APIs and tools. > > Seems to be weak consensus on doing this one. > > Seems like a recipe for severe API incompatibility. Yes. But it's been mentioned, so I listed it ;-) > > > (6) Lax syntax and error recovery > > There's strong demand to allow processors to do error recovery, > > from some user communities. This mostly seems to me to be > > Web browser programmers who deal with faulty RSS a lot; on the > > other hand, e.g. SOAP people would fight hard to keep this out > > (and it's certainly not a feature of JavaScript or JSON either). > > Not clear consensus here. > > Making a new version of XML and making it Draconian *again* would truly be tragic. Or, making a new version of XML and losing its best advantage over HTML would be truly tragic :-) it depends who you ask. > > > (7) Minimization > > This overlaps with No. 6, lax syntax. Many people want to use > > a terser syntax, or have it as an option. There is not (yet) > > strong consensus on what that should be. Some people want > > <e>....</> or <e/..../ as per SGML. But there is not strong > > support for the exact SGML OMITTAG rules I think (which are > > complex and require a DTD) > > > > Neither is there support for DATATAG or the other SGML features > > exactly, but there do seem to be people who want some sort of > > terser markup. > > > > There has even been a LISP-like syntax suggested. > > The counter-arguments are usually simplicity and robustness. > > Not yet consenus. > > FWIW, you can't have this *and* also have convergence between XML and HTML. Convergence between XML and HTML hasn't been a strong topic on the list, though. Should it be? When I worked at SoftQuad, SGML minimization was a significant part of our technical support costs - maybe as high as 80% at times - because people would call up and say, "I've got this 5 megabyte SGML document and Author/Editor won't open it, it says, invalid content after the end of the document, or, mismatched tags, and it all looks fine to me". But that doesn't really mean terse syntax was at fault so much as the SGML rule, "if you see a tag you didn't expect, close elements until you get a match". The HTML 5 rules are laxer, although HTML has the advantage that you can look and see if you're OK with the result; XML tends to be used in more program-like ways, where you really need rather more certainty. So again it's a different-user-community thing, and the question is how to support both sets of usages, or whether it's better to have XML do one and not the other. > > > What is the business case here? > > That's indeed the big question. At TPAC, TimBL said on stage (roughly, > not exact quote) that XML is used too much in the enterprise for XML > to change. I missed his talk, but I think that's true, we're very limited in what we can do. On the other hand, all bets are off if it's a 2.0 -- except, as you noted, it's not clear that enough people would want it. Thanks for your comments! Liam -- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ Pictures from old books: http://fromoldbooks.org/ Ankh: irc.sorcery.net irc.gnome.org www.advogato.org
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|