[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: xml over http - RFC 3023

  • From: "Andrew Welch" <andrew.j.welch@g...>
  • To: "Rick Jelliffe" <rjelliffe@a...>
  • Date: Mon, 1 Dec 2008 10:16:43 +0000

Re:  xml over http - RFC 3023
Hi Rick,

> The out-of-band signalling of character encoding is a fundamentally broken
> idea, because there are no mechanisms for programs which generate data to
> memoize the character encoding used that can then feed the rest of the
> food-chain.

How about the BOM - that's one way isn't it?  I wonder if a similar
ignorable byte sequence could be added to the start of all byte
sequences to indicate the encoding of what's coming.

>> At the moment it all seems pretty complicated...

> It is not complicated. Use application/xml
> If you do find intermediate web systems that implement the ASCII default or
> the IS8859-1 default as anything other than 8-bit clean for text/xml submit
> a bug report.

I'm dealing with RSS feeds from all over the world, so it's:

- 3 different types of feeds
- multiple languages, multiple encodings
- embedded inconsistenly escaped html, or cdata sections, or both
- and even, use of entities without even including the doctype, so it
doesn't even parse without help

It is possible to reject some of the feeds, but other readers accept
them so this one needs to at least match them before taking the moral
high ground (and it's not too hard to code around the problems).

So this is a real test of XML on the web.  The complicated part I was
referring to is reading the bytes from the http input stream in the
right encoding:

- extract the encoding from the contenttype
- if its not there read the first few bytes of stream in us-ascii and
then extra the encoding from the prolog
- if its not there use utf-8
- hope that actual encoding of the file and the encoding you've discovered match

...and that's not even completely correct as far as I understand.

So when you say:

"It is not complicated. Use application/xml"

I don't get it, what am I missing?

I would've thought the webserver would be aware that it was serving
xml and take of it - it could extract the encoding from the xml prolog
and ensure the file was served with that (maintaining it however it
liked)... it seems odd that the client should go through this process
every time.

Andrew Welch
Kernow: http://kernowforsaxon.sf.net/

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
First Name
Last Name
Subscribe in XML format
RSS 2.0
Atom 0.3

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.

Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.