[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: xml over http - RFC 3023
> For application/xml you ignore the first step and go straight to the > document. If your data is usually in UTF-8 or ASCII, you could perhaps read > in the first block from bytes to characters and (if the transcoder has not > generated an exception) confirm that there is no XML encoding declaration or > BOM or that the string "utf-8" does not appear in the XML encoding > declaration, in which case you don't need to do anything more complicated. > If your data is text/xml, you are indeed in a sea of complication, which is > why text/xml has been discouraged for so long. ok, that makes sense, thanks. > Maybe, but the mechanism for this occur, for Apache at least, is for someone > to write it, contribute it, champion it and maintain it. Champion a mechanism for a web server to serve xml? really? > But the basic XML contract is that the encoding must be explicitly labelled > by the sender (creator of the document) and the recipient should not guess > but use the label. If this is too much for naive users, then XML is simply > not the technology for them, and XML should not be blamed for not working in > a situation it explicitly was designed to avoid. It is just like if someone > does not know what + means they cannot use a calculator. It is not an > indictment of mathematics if someone who does not know + cannot use a > calculator. Character encoding is just as fundamental to computer > programming as knowledge of the difference between floats and ints, for > example: that Western computer science and IT courses have guaranteed the > ignorance of their students in this is sad. Er, ok. You do realise there is a different expert somewhere else in the world saying exactly the same thing about their specialist area. (not sure I agree with that analogy either) > In any case, I thought most people had written off RSS as unprocessable by > generic XML tools, because so much RSS was not well-formed? I thought one > reason for Atom was that the early RSS systems creators messed up their XML > and RSS never recovered. With RSS, what you are not experiencing the > failure of XML on the web, you may be experiencing the failure of non-WF XML > (and the potential complexity of figuring out text/xml). The vast majority of the feeds are RSS, very few are Atom so from here it looks like Atom has had little impact so far. Processing the RSS feeds are a pain but manageable using xslt 2.0 calling out to tagsoup, jtidy etc, and using a LexicalHandler to intercept the entities. From my naive perspective, I would've thought the web server would serve the XML with the correct encoding in the contenttype so I don't have to ignore it, and/or I could the XML parser a url and it would take care of it. I'm not sure why I should be reading appendices of the spec and writing low-level code for something that should be an everyday task. In that respect, I think, you could argue it hasn't succeeded yet. -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|