[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: text/xml vs. application/xml

  • From: David Megginson <ak117@f...>
  • To: xml-dev@i...
  • Date: Mon, 22 Dec 1997 10:42:48 -0500

java encode xml iso problem
Gavin Nicol writes:

 > >   It is an error for an entity including an encoding declaration to
 > >   be presented to the XML processor in an encoding other than that
 > >   named in the declaration, or for an encoding declaration to occur
 > >   other than at the beginning of an external entity.
 > 
 > Note that this is "an error" not a fatal, or even necessarily
 > reportable error.  

Absolutely correct -- Tim Bray made the same point on this list a
couple of weeks ago.  The parser is not _required_ to report an error,
but it is allowed to; in either case the document is still not
well-formed.

 > >1) java EventDemo http://www.myhost.org/texts/sample.xml
 > >   ==> receives charset="ISO-8859-1" as the default, ignores the 
 > >       encoding declaration, produces correct output (accidentally),
 > >       and reports no error.
 > 
 > It could report a mismatch.

In this case yes, because it was possible to parse the encoding
declaration.  If the document had been encoded in UCS-2, it is
unlikely that the parser would even have recognised an encoding
declaration if it were trying to parse with the default
charset="ISO-8859-1" (the parser would have to have some very
sophisticated error-recovery techniques).

 > >2) java EventDemo ftp://ftp.myhost.org/pub/texts/sample.xml
 > >   ==> reads the encoding declaration, realises that the document is
 > >       _not_ in UCS-2, and reports an error (or worse, puts out
 > >       garbage without reporting an error).
 > >
 > >3) java EventDemo sample.xml
 > >   ==> same as (2).
 > >
 > >It is counter-intuitive that well-formedness depends on the
 > >transmission protocol.
 >        
 > I would argue that all 3 could, and perhaps should produce similar
 > results. 

In that case, however, it will be necessary to amend the PR, so that
parsers will not have the option of reporting an error, and so that
the documents will qualify as well-formed.

 > This has nothing to do with MIME types. The main reason for problems
 > is that people (often unknowingly) violate the standards. HTTP is
 > pretty clear that for anything other than ISO 8859-1, the content must
 > be labelled correctly (i.e. it must have the correct charset).

Unfortunately the only people who have control over that labelling are
the system administrators -- if Sprynet decides to return the MIME
type text/xml for all *.xml files, then I probably will not have the
option of posting XML documents on my personal web site in anything
but ISO-8859-1.

Furthermore, the other problem remains: if text/xml uses ISO-8859-1 as
the default, the the PR _must_ be amended to require XML processors to
support ISO-8859-1 encoding -- after all, XML is a profile of SGML
designed specifically for the Internet, and we will have a lot of
explaining to do if it cannot play nicely.

 > The only time application/xml really makes sense is when UCS-2 or
 > UTF-16 data is being sent via email. 

In theory, yes; in practice, no.  Private users built HTML into
something big enough to attract the interest of the corporate and
government sectors -- using text/xml will mean that for the next
several years, at least, many private users will be unable to post
anything but ISO-8859-1-encoded documents in their personal web space
easily (and no XML parsers are required to support that encoding).  

This type of consideration does not matter so much for SGML, which is
an International Standard defined independent of its media; XML,
however, is a consortium standard created for a specific medium, so it
cannot afford to ignore the more pragmatic concerns.


All the best,


David

-- 
David Megginson                 ak117@f...
Microstar Software Ltd.         dmeggins@m...
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@i...
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@i... the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@i... the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@i...)


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.