[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Data streams


microsoft office metadata
In consideration of Elliotte's reply, I went back and looked at the XML file
Excel generated. Here's what I found ...

Every one of the XML data elements had this tagging structure:
<Row>
    <Cell><Data ss:Type="Number">1</Data></Cell>
</Row>

In contrast, the CSV had this structure: 1,

That's a 50 characters to 1 difference for each data element.

I doubt that all those XML tags are necessary if you're rendering the data
in something other than a spreadsheet. But if you are planning to use a
spreadsheet, then the 50 to 1 ratio is valid, it seems to me. 

Does anyone know what a reasonable tagging equivalent might be if you're,
say, distributing a data array in XML for SVG rendering? It might be fewer
than 50, but it will still be a lot more than 1, especially if you have data
type attributes.

In addition, the XML doc had about 50 lines of additional tags at the
beginning and end of the file, which was Microsoft Office metadata not in
the CSV. While some are certainly necessary for a valid XML doc, I'm sure
some are superfluous. But even if you subtracted all those lines from the
total characters, it had almost no affect on the size comparisons when
you're dealing with a large data array.

So, this benchmark test still points to a huge difference in file size and
in unzipping and parsing time when you compare a large data array in CSV
compared to XML.

Steve


-----Original Message-----
From: Elliotte Harold [mailto:elharo@m...] 
Sent: Monday, December 06, 2004 2:43 PM
To: Stephen E. Beller
Cc: xml-dev@l...
Subject: Re:  Data streams

Stephen E. Beller wrote:

> I tried Steven's experiment from a different angle. I filled an Excel XP
> spreadsheet with a single-digit number, saved it in both XML and in a
> comma-delimited text file (CSV). I then compressed both with WinZip and
then
> opened both with Excel. Here's what I found:

That sounds like a bad test. The XML file contains a lot more 
information than the CSV file. Specifically it contains a lot of 
Microsoft Office metadata about things like the name of the person who 
created the file that are not in the CSV file. There is information in 
the XML file that is not present in the CSV file.

-- 
Elliotte Rusty Harold  elharo@m...
XML in a Nutshell 3rd Edition Just Published!
http://www.cafeconleche.org/books/xian3/
http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.