[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: [Summary] Media type (MIME) of XML in MS Word? in Notepad?


mime notepad
On 2006-06-12 18:47:33 -0400 "Costello, Roger L." <costello@m...> wrote:
> The Editor used to Create the XML Determines its MIME Type

Gah!  No!  No, no, no!

> Interestingly, you may have a document which contains XML and yet its
> MIME type may not be application/xml.
> 
> For example, take this simple XML:
> 
> <?xml version="1.0"?>
> 
> <root>
>       Blah
> </root>
> 
> and put it into Word (save it as a .doc file).  The MIME type is:
> 
>       application/msword

True as far as it goes, but that's because it's *not XML!*

Try this experiment.

Type the above in Word.

Save as .doc (default).

Open a DOS box (or whatever they call it these days) and say "type 
NameOfDocument.doc".

Does it *look* like XML?  No.  It violates the rules for XML, namely that 
the XML declaration *must* be the first thing encountered in the data.

Do this with any random proprietary-format tool you care to; same result.  
The fact that you are "quoting" an XML document inside some other document 
format does *not* make that format somehow magically become XML.  It's still 
what it is.

Put that XML document into a cell in an Excel spreadsheet.  Do you *really* 
expect the .xsl that you saved to be "XML"?

Here's something fun.  Type that stuff into OpenOffice.org's word processor 
and save in default format.  Is the result XML?  Well, no.  It's a 
compressed (.zip) directory.  Unzip it, and what's inside?  Hey, there's 
XML!  Only ... no, it isn't the XML you *typed*.  All that has been quoted 
(escaped) into CDATA.  But it *is* XML, only it's a different document type.

> Conversely, if you put the same XML into Notepad, the MIME type is:
> 
>       application/xml

Bloody not if you accept the "suggestion" of notepad that it ought to have a 
".txt" extension.  Then it's text/plain.

> Why is that?  Why is it that if you put XML into one editor (Word) you
> get a MIME type that is specific to the editor, whereas if you put XML
> into another editor (Notepad) you get a MIME type that is independent
> of the editor?

Well, because you don't?

The "MIME type" of a document is not stored in a document.  A variety of 
heuristics may be applied to dynamically determine the MIME type; this was 
true of document formats even before MIME (see file(1)).  The commonest 
heuristic is the "extension", the bit that comes after the last dot in a 
filename, typically 1-4 characters (in the DOS world, always three 
characters).  In that heuristic, .doc maps to application/msword (even if 
it's *actually* an Excel spreadsheet), and .txt maps to text/plain (even if 
it's *really* a pkzip-compressed encrypted security analysis in a 
proprietary format) and .xml maps (as a rule) to application/xml.

What happens to a Windows application if the MIME type doesn't match the 
extension?

Damn all.  Windows doesn't care about MIME types.

What happens to a Windows application if the data format doesn't match the 
extension?

Crash.  Hopefully, the application just refuses to read it, but it could 
crash, and given the general level of protection in the system, it could 
bring the system down.

What happens to a BeOS application if the MIME type doesn't match the 
extension or data format?

BeOS used MIME types in the file system, and preferred to trust them rather 
than extensions.  A decent application should have degraded gracefully.  In 
worst case, see above ("bring the system down").

Critically: a MIME type is *metadata*, it is a label placed on the data, it 
is not inherent in the data.  Data does not "have" a MIME type, it is 
*assigned* a MIME type (or not, if it isn't relevant, as for most 
applications running on Windows).  Windows cares about "file types" 
(extensions), not MIME types.  Web servers typically care about MIME types 
(although HTTP isn't a MIME-compliant protocol, but that's a different 
rant).  Browsers, consequently, usually care about MIME types.

I can write the above document in Word (ewwww, and wash my hands after), and 
save it as a .doc, and then instruct my webserver to deliver it as 
application/xml regardless of the extension, and a browser that receives it 
... will choke, because it *isn't XML*.  The webserver, not being Word, 
can't strip the cruft; the web browser, not being Word, gets confused when 
handed application/XML that doesn't start with an XML declaration.

> The answer is this: when the XML is put into Word, the Word application
> wraps the XML with a bunch of Word-specific stuff (the wrapper stuff is
> not visible).

Oh, *yes* it is!  Unless, of course, you happen to be using one application, 
namely MS Word.

> Conversely, Notepad does not wrap the XML with anything.  The document
> is pure XML, it can be fed directly into an XML parser, and thus it has
> a MIME type of application/xml.

No it isn't.  It's whatever MIME type you assign to it.  If you call it 
text/plain, it's text/plaini.

Amy!
(in a ranting mood ... but the summary was misleading, I'm sorry)
-- 
Amelia A. Lewis                    amyzing {at} talsever.com
There's someone in my head, but it's not me.
                 -- Pink Floyd


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.