[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: SV: SV: XML=WAP? And DOA?


xml to csv parser

>> >1) Writing CSV code is easier than XML code (no DOM or anything, just
>> >something like SAX; I write the CSV parser myself in less code than it
>> >takes to interface with an XML parser)
>>
>> If DOM is too hard (and mostly I agree with you) use SAX or use JDOM.
>> JDOM certainly is much simpler for the sorts of things you're doing.

Ok, if the data model is flat, then SAX gets easier than DOM (and is
probably a lot more appropriate)  - but I bet I've seen 2:1 posts of people
finding SAX difficult:finding DOM difficult (the handler calling appears
back to front, whereas a tree is just a tree...).

It depends somewhat on your data source - can you specify the format in
which the data is generated?

>> DOM != XML

Well yes, but if you just say XML = http://www.w3.org/TR/REC-xml then you
lose a lot. (BTW, does XML = Infoset yet?)

>The code I have that does XML data import (to the same engine as my CVS
>data import) uses SAX, yes, but it's still bigger since it then needs to
>implement a state machine to pull apart the table structure from a tree.

Surely the CSV parser needs to know when it gets to the end of a row?
Doesn't that have to deal with exactly the same kind of states?

>> >2) Data corruption? XML parsers are *fragile*, CSV parsers can
>often cope
>> >with erronious data in ways that XML parsers mustn't if they are to be
>> >standards compliant!
>>
>> That's a feature, not a bug. If the data is bad, I want to know about
>> it ASAP and get it fixed at the source. Draconian error handling is a
>> very good thing.
>
>Depends if you're working in a world of potentially dodgy data sources...

Well, that's air traffic control for you...

>I'd rather not *know* if data is bad, I'd rather the system transparently
>fixed it, and only told me if it's too bad to properly process.

XML parsers are not usually fragile - faced with bad data, they let you
know.

Horses for courses, of course - where you draw the line on 'too bad' - all
data sources are potentially dodgy, and it's easy enough to express junk in
well-formed, valid XML. For most practical purposes more draconian measures
definitely make life easier, because you get a clearer signal (which you can
always handle in a pragmatic fashion).

>With my CSVs, if one row is missing a field or has an extra field (so the
>CSV is not well formed, eg not all the rows are the same length) or if
>there's a field name that I do not recognise, then I signal that as an
>error and stop.
>
>But if they've just used a strange date format, as long as it's parsable,
>I'd rather be able to study it and then add support for that date format
>so it's not an error in future than have it be forced as an error by some
>spec.

Surely the same applies to SAX?

This situation is ok as long you are in a position to offer human
surveillance, and don't have to justify (or even estimate) the accuracy of
your data. These are exceptional circumstances!

Personally I'd be tempted to opt for an XML solution on the grounds of
interoperability (and I'm sure it wouldn't involve much more work that CSV
building/parsing), and adopting a standard would also help in the tribunals.

Cheers,
Danny.



---

Danny Ayers
<stuff> http://www.isacat.net </stuff>



PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.