[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]


  • From: John Cowan <johnwcowan@gmail.com>
  • To: Rick Jelliffe <rjelliffe@allette.com.au>
  • Date: Fri, 20 Apr 2018 00:22:24 -0400


On Thu, Apr 19, 2018 at 11:55 PM, Rick Jelliffe <rjelliffe@allette.com.au> wrote:

* I think Postel's principle speaks about the risk of rarely-implemented things, in particular, optional things. So I don't believe that there is any danger of a lot of XML processors that don't handle > or hex characters or CDATA sections.

I agree with that as far as it goes. The reason for escaping all > characters is that although the rule that specifies when they MUST be escaped is well-embodied in software, it isn't well-known to document authors; it rarely comes up and so is easily missed.  Similarly, it's easy to get the 9-character token "<![CDATA[" subtly wrong, in which case it will not do what the author expects.
* I was assuming that Postel's principle applies to implementation: minimising the use of PIs, namespaces, characters in data, are all authorial decisions. If an implementer refuses to transfer them as part of the data, their code is not "robust" it is "corrupting".  

If you are writing a generic XMLWriter component, that's true.  But it's commonplace for the "author" of a document nowadays to be another piece of software driven by higher-level concerns.  If you are using XML to communicate, the details of how you use it are part of the implementation of that communication.
* I don't really concur with several of your other points: indeed, I think there is a case that to get robustness you really need to only use ASCII repertoire for direct characters, and you should use HNCR for everything else.are not in the same Unicode ranges as the names in markup should be HNCR.

That would make hash of non-English character content, violating Goal 6 ("XML documents should be human-legible and reasonably clear.")  Writing French or Greek or Hindi text with HNCRs is a non-starter, and even if you are allowed to write using, there are also the script-specific punctuation marks that aren't allowed in names.  You really don't want them to be HNCRs either.

In any case, my examples were just that, examples.

That being said, I think you do get a different set of "conservative" issues as soon as your XML has to be some other format at the same time as being XML: for example, that your XHTML must be text in some encoding, *AND* XML, *AND* HTML.  Or that your line-oriented XML for AWK processing must be text in some encoding *AND* simple lines *AND XML. 

Certainly.  Although I note that there is an XML plugin for gawk, which supplements the BEGIN and END patterns with things like XMLSTARTELEMENT.  Pretty neat.  There are similar plugins for JSON and Postgres.

John Cowan          http://vrici.lojban.org/~cowan        cowan@ccil.org
weirdo:    When is R7RS coming out?
Riastradh: As soon as the top is a beautiful golden brown and if you
stick a toothpick in it, the toothpick comes out dry.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.