[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Postel's law, exceptions

inputstreamreader encoding
On Tue, 13 Jan 2004 17:57:15 -0800
"Dare Obasanjo" <dareo@m...> wrote:
> >-----Original Message-----
> >From: Michael Champion [mailto:mc@x...] 
> >Subject: Re:  Postel's law, exceptions
> >
> >That poses a bit of a problem for the XML community -- is the 
> >rational response to "fix" the bits of XML that people stumble 
> >over [awaiting shrieks from the people who shot down XML 1.1], 
> I work on RSS in my free time. The most common well-formedness errors
> are documents with incorrect encodings or documents that use HTML
> entities without a reference to the HTML DTD. How exactly do you
> propose XML 1.x fix these problems? 

Interesting.  I have had long (and ultimately pointless, since neither
of us was at all interested in changing our opinions) of whether the XML
declaration is a good idea (my position) or a horrible kludge.  Someone
taking up the latter position could easily enough argue that the XML
declaration (and thus the place where one can place an encoding
indicator inside the datastream) should be removed (to be replaced by
some other, protocol- or API-specific metadata (presumably a header, in
the case of RSS)).

My argument amounted to saying that the inclusion of the information was
useful and parallel to the information extraction used by utilities such
as unix file(1) (based on magic(5)).

His argument was that, particularly when it was being generated, the
declaration was supplied too late to be used intelligently, particularly
since the encoding specification is itself encoded in the specified
encoding.  He regards the clever tricks for recognizing an encoding
(really, for recognizing a class of encodings) in the 1.0 appendix as a
horrible bit of nasty hackery.  All of this is largely from the
perspective of the Java API, and due to arguments over whether to use
Reader/Writer or InputStreamReader/OutputStreamWriter or
InputStream/OutputStream (the latter two with specified encodings).

It was an interesting argument.  I found myself in the position of
arguing that *all* character streams in Java have encodings (including
java.lang.String).  The counter to this is a filter stream, such as a
TeeWriter (I was arguing that the problem was that Java did not provide
a getEncoding() method on all streams).

The issue has an interesting parallel with the WXS and RNG deprecation
of strong association of an instance with a schema.  DTD has strong
ties; you can say "this is a document conforming to that DTD".  WXS
says, a little less emphatically "you might find useful information
about this namespace at this location over here" and RNG simply refuses
to offer a standard mechanism to specify, inside an arbitrary document,
a pointer to an RNG schema that it supposedly conforms to.  The
situations are not exactly parallel, of course, because the absence of
an XML declaration implies a particular XML declaration (version=1.0,
encoding=utf-8).  But ... there are certainly documents out there that
can be read with equal facility using a large number of encodings (any
document that contains only the ASCII subset could reasonably be tagged
as ASCII, ISO-8859-whatever-you'd-like, Windows CP-most-anything, or
UTF-8 (and maybe even Shift-JIS?  dunno that one for certain), since all
of those encodings define the lower 128 characters to be identical to
those defined in ASCII).

Should the XML declaration be deprecated?  Should the metadata that it
provides be supplied outside the datastream instead?

Amelia A. Lewis                    amyzing {at} talsever.com
A joke, that.  Love was the problem, not the solution.  Being hit by a
car was better than love.
            -- Steven Brust, PJF, "Cowboy Feng's Space Bar and Grille"


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.