[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: nextml

  • From: Norman Gray <norman@astro.gla.ac.uk>
  • To: liam@w3.org
  • Date: Thu, 9 Dec 2010 17:19:12 +0000

Re:  nextml

Liam, hello.

On 2010 Dec 9, at 16:49, Liam R E Quin wrote:

> On Thu, 2010-12-09 at 12:13 +0000, Norman Gray wrote:
> [...]
>> ...and later, Liam Quinn wrote:
> Actually it was me (Liam Quinn is someone else)

Ah, apologies. (several someone elses, by the look of it)

>>> The most frequent change request I hear is to remove the strict syntax
>>> requirements and make every XML implementation include some sort of
>>> HTML-like expert system to do the parsing, automatically "correcting"
>>> errors like missing quotes off attribute values.
> Please note, I'm *not* advocating such a change, but rather saying that
> it's the request I hear most often.

It didn't sound to me like you _were_ advocating it; sorry for not making that clearer.

> [...]
>> If 'XML-bis' were defined using lexer events, with strings defined as
>> sequences of unicode code points, then a JIS-encoded document with
>> missing quotes could be (required to be) handled by the lexer,
>> entirely transparently.  In other words, why is file/wire encoding
>> anything to do with XML?
> Because XML is about file interchange.
> If your XML processor won't read my XML document, we've failed.

I think that separating out the lexing makes this easier, not harder.

I could imagine a standard declaring that an XML parser shall process a stream of unicode codepoints.  The standard might note that this does imply that there's some sort of shim between the XML and the file I/O, but declare that what's in this shim is none of its concern.

The obvious content of that shim would of course be nothing more than the platform's UTF-8 reading support, but if someone wanted to be funky and support something else, in a context where all the necessary information was available (for example, from an HTTP header), then the XML standard isn't about to stop them.

I'm not _necessarily_ advocating this as a vital ingredient, but it would surely short-circuit a certain amount of agonising about which UTF-* variants to accommodate, and separates parsing layers quite naturally.

[A more out-there position is to define XML in terms of a sequence of SAX events, or equivalent, but that obviously stops being a file-interchange standard]

All the best,


Norman Gray  :  http://nxg.me.uk

  • Follow-Ups:
    • Re: nextml
      • From: Norman Gray <norman@astro.gla.ac.uk>
  • References:
    • nextml
      • From: Amelia A Lewis <amyzing@talsever.com>
    • Re: nextml
      • From: Uche Ogbuji <uche@ogbuji.net>
    • Re: nextml
      • From: James Clark <jjc@jclark.com>
    • Re: nextml
      • From: Norman Gray <norman@astro.gla.ac.uk>
    • Re: nextml
      • From: Liam R E Quin <liam@w3.org>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.