Re: Some comments on the 1.1 draft

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

To: xml-dev@l...
Subject: Re: Some comments on the 1.1 draft
From: Gavin Thomas Nicol <gtn@r...>
Date: Wed, 19 Dec 2001 11:42:09 -0500
In-reply-to: <20011219182436.L9114@i...>
Organization: Red Bridge Interactive, Inc.
References: <5C39F806F9939046B4B1AFE652500A3A251914@R...> <021401c1885b$2ec26da0$4bc8a8c0@A...> <20011219182436.L9114@i...>

On Wednesday 19 December 2001 02:24 am, Alan Kent wrote:
> To separate the two issues - I have no opinion on name characters.
> PCDATA however is different. I read through you entire post twice
> and must admit I still don't quite understand what your point is
> exactly. I *think* you might be saying "its good to specify the
> encoding because that way its possible to make sure characters
> not valid in that encoding are rejected." (My reading of the XML spec
> is that 0x85 is legal in the Unicode character set - that is, its
> not marked as UNUSED in the good old SGML jargon.)
>
> If this is your point, then would it be possible to define a new
> encoding which permitted the full range of Unicode characters
> (including control characters which are valid in Unicode).
> Would that address your issues?

The point is that characters != bytes != encoding. If you start allowing 
control characters (which are somewhat debatable *as* characters in the first 
place), it becomes very easy to abuse the power and to have 
application-specific uses of embedded encodings. This is effectively what Mr. 
Rhys from MS wanted: the ability to store arbitrary binary streams inside XML 
encoded data.

The problem is that XML is *text*. It is made from *characters*, and 
arbitrary binary strings have no place in it. Once you change that, you have 
essentially ruined XML as a textual markup language.

People could say that NUL et al. are still *characters* and so would be fine, 
even in UTF-8 encoded documents, but I bet they'd be rather unhappy to find 
their binary streams changing if I saved the document as UTF-16.

The point here is that these things are unreliable.

Follow-Ups:
- RE: Some comments on the 1.1 draft
  - From: "J C Theriot" <theriot@p...>

References:
- RE: Some comments on the 1.1 draft
  - From: "Michael Rys" <mrys@m...>
- Re: Some comments on the 1.1 draft
  - From: "Rick Jelliffe" <ricko@a...>
- Re: Some comments on the 1.1 draft
  - From: Alan Kent <ajk@m...>

Prev by Date: Re: Some comments on the 1.1 draft
Next by Date: Re: terra incognita
Previous by thread: Re: Some comments on the 1.1 draft
Next by thread: RE: Some comments on the 1.1 draft
Index(es):
- Date
- Thread

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >