|
[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: Postel's law, exceptions
On Tue, 13 Jan 2004 17:57:15 -0800 "Dare Obasanjo" <dareo@m...> wrote: > >-----Original Message----- > >From: Michael Champion [mailto:mc@x...] > >Subject: Re: Postel's law, exceptions > > > >That poses a bit of a problem for the XML community -- is the > >rational response to "fix" the bits of XML that people stumble > >over [awaiting shrieks from the people who shot down XML 1.1], > > I work on RSS in my free time. The most common well-formedness errors > are documents with incorrect encodings or documents that use HTML > entities without a reference to the HTML DTD. How exactly do you > propose XML 1.x fix these problems? Interesting. I have had long (and ultimately pointless, since neither of us was at all interested in changing our opinions) of whether the XML declaration is a good idea (my position) or a horrible kludge. Someone taking up the latter position could easily enough argue that the XML declaration (and thus the place where one can place an encoding indicator inside the datastream) should be removed (to be replaced by some other, protocol- or API-specific metadata (presumably a header, in the case of RSS)). My argument amounted to saying that the inclusion of the information was useful and parallel to the information extraction used by utilities such as unix file(1) (based on magic(5)). His argument was that, particularly when it was being generated, the declaration was supplied too late to be used intelligently, particularly since the encoding specification is itself encoded in the specified encoding. He regards the clever tricks for recognizing an encoding (really, for recognizing a class of encodings) in the 1.0 appendix as a horrible bit of nasty hackery. All of this is largely from the perspective of the Java API, and due to arguments over whether to use Reader/Writer or InputStreamReader/OutputStreamWriter or InputStream/OutputStream (the latter two with specified encodings). It was an interesting argument. I found myself in the position of arguing that *all* character streams in Java have encodings (including java.lang.String). The counter to this is a filter stream, such as a TeeWriter (I was arguing that the problem was that Java did not provide a getEncoding() method on all streams). The issue has an interesting parallel with the WXS and RNG deprecation of strong association of an instance with a schema. DTD has strong ties; you can say "this is a document conforming to that DTD". WXS says, a little less emphatically "you might find useful information about this namespace at this location over here" and RNG simply refuses to offer a standard mechanism to specify, inside an arbitrary document, a pointer to an RNG schema that it supposedly conforms to. The situations are not exactly parallel, of course, because the absence of an XML declaration implies a particular XML declaration (version=1.0, encoding=utf-8). But ... there are certainly documents out there that can be read with equal facility using a large number of encodings (any document that contains only the ASCII subset could reasonably be tagged as ASCII, ISO-8859-whatever-you'd-like, Windows CP-most-anything, or UTF-8 (and maybe even Shift-JIS? dunno that one for certain), since all of those encodings define the lower 128 characters to be identical to those defined in ASCII). Should the XML declaration be deprecated? Should the metadata that it provides be supplied outside the datastream instead? Amy! -- Amelia A. Lewis amyzing {at} talsever.com Love? A joke, that. Love was the problem, not the solution. Being hit by a car was better than love. -- Steven Brust, PJF, "Cowboy Feng's Space Bar and Grille"
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|
|||||||||

Cart








