[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Data streams
Even though CSV is much more efficient for distributing large data arrays, you're certainly correct about the perils of CSV sans any metadata. While the row and cell tags generated by Excel are a form of metadata -- in that assure the data are parsed to the proper cells in a spreadsheet -- they say nothing of what those data mean. One way is to add a "header row" in the CSV defining what the data elements parsed into each spreadsheet column are (i.e., a "field" definition a la relational databases). But even that would not give you XML's hierarchical/associative capabilities; to do so would require additional data. I know of some innovative/proprietary ways to use CSVs and spreadsheets to replicate the full array data capabilities of XML, including a way to manage a wealth of meaningful metadata and formatting instructions, while keeping the CSV data trimmed down to its streamlined essence (i.e., the ability to send 17 million data elements in a 150KB file that is rapidly uncompressed and parsed). But this is not a discussion for this forum. Anyone interest can contact me off-list. Steve -----Original Message----- From: Bill Kearney [mailto:wkearney@s...] Sent: Monday, December 06, 2004 3:56 PM To: Stephen E. Beller; xml-dev@l... Subject: Re: Data streams This also speaks to the somewhat verbose form of XML that Office might be producing. It's certainly no surprise to anyone that the data was larger and compressed differently in XML than CSV. Especially not with the example you proposed. I think your conclusion about CSV effectiveness is short-sighted. While CSV can certainly be "bit stingy" it often comes at the considerable cost of being brittle. Without effective metadata those numbers just become gibberish. While it's fair to say an XML file may be larger it does so in a remarkably self-documenting way. Where's the balance to be struck? In lightweight CSV that's fraught with processing perils? Or in methodically documented XML that simply takes a few cycles longer? CPU and Disk is cheap, programming time and budget to work around crappy, brittle data isn't. It might be a more interesting experiment to discuss using more purpose-built XML schemas. Doing a better job of describing the data in with XML without being so verbose. While Office may not offer it at this point that doesn't preclude others from doing a better job of it. -Bill Kearney Syndic8.com ----- Original Message ----- From: "Stephen E. Beller" <sbeller@n...> > I tried Steven's experiment from a different angle. I filled an Excel XP > spreadsheet with a single-digit number, saved it in both XML and in a > comma-delimited text file (CSV). I then compressed both with WinZip and then > opened both with Excel. Here's what I found: > > The XML file was 840MB, the CSV 34MB -- a 2,500% difference > Compressed, the XML file was 2.5MB, the CSV 0.00015MB (150KB) -- a 1,670% > difference. > > Equally dramatic is the time it took to uncompress and render the files as > an Excel spreadsheet: It took about 20 minutes with the XML file; the CSV > took 1 minute -- a 2,000% difference. > > My conclusion is that delimited text files handle large arrays of data more > efficiently. This stems, in part, from the fact that a comma delimiter (or > some other single character) carries much less overhead than tags; CSV > requires only a comma, while XML requires a minimum of 5 characters (<></>) > -- that's makes CSV a minimum of 500% more efficient ... and when you add > the semantic labels and attributes to the tags, and the size of XML > increases dramatically. > > Note, however, that when dealing with large blocks of text instead of > numbers (or small text strings), the difference between XML and delimited > text files is considerably less. > > Of course, XML offers benefits that a plain data array in a CSV file does > not, such as attribute definitions and hierarchical associations between the > data (if that's necessary) ... even though there are ways comma-delimited > data can be used to perform the same functions of XML when rendering > serialized data arrays as charts.
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|