[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: SV: SV: XML=WAP? And DOA?
>> >1) Writing CSV code is easier than XML code (no DOM or anything, just >> >something like SAX; I write the CSV parser myself in less code than it >> >takes to interface with an XML parser) >> >> If DOM is too hard (and mostly I agree with you) use SAX or use JDOM. >> JDOM certainly is much simpler for the sorts of things you're doing. Ok, if the data model is flat, then SAX gets easier than DOM (and is probably a lot more appropriate) - but I bet I've seen 2:1 posts of people finding SAX difficult:finding DOM difficult (the handler calling appears back to front, whereas a tree is just a tree...). It depends somewhat on your data source - can you specify the format in which the data is generated? >> DOM != XML Well yes, but if you just say XML = http://www.w3.org/TR/REC-xml then you lose a lot. (BTW, does XML = Infoset yet?) >The code I have that does XML data import (to the same engine as my CVS >data import) uses SAX, yes, but it's still bigger since it then needs to >implement a state machine to pull apart the table structure from a tree. Surely the CSV parser needs to know when it gets to the end of a row? Doesn't that have to deal with exactly the same kind of states? >> >2) Data corruption? XML parsers are *fragile*, CSV parsers can >often cope >> >with erronious data in ways that XML parsers mustn't if they are to be >> >standards compliant! >> >> That's a feature, not a bug. If the data is bad, I want to know about >> it ASAP and get it fixed at the source. Draconian error handling is a >> very good thing. > >Depends if you're working in a world of potentially dodgy data sources... Well, that's air traffic control for you... >I'd rather not *know* if data is bad, I'd rather the system transparently >fixed it, and only told me if it's too bad to properly process. XML parsers are not usually fragile - faced with bad data, they let you know. Horses for courses, of course - where you draw the line on 'too bad' - all data sources are potentially dodgy, and it's easy enough to express junk in well-formed, valid XML. For most practical purposes more draconian measures definitely make life easier, because you get a clearer signal (which you can always handle in a pragmatic fashion). >With my CSVs, if one row is missing a field or has an extra field (so the >CSV is not well formed, eg not all the rows are the same length) or if >there's a field name that I do not recognise, then I signal that as an >error and stop. > >But if they've just used a strange date format, as long as it's parsable, >I'd rather be able to study it and then add support for that date format >so it's not an error in future than have it be forced as an error by some >spec. Surely the same applies to SAX? This situation is ok as long you are in a position to offer human surveillance, and don't have to justify (or even estimate) the accuracy of your data. These are exceptional circumstances! Personally I'd be tempted to opt for an XML solution on the grounds of interoperability (and I'm sure it wouldn't involve much more work that CSV building/parsing), and adopting a standard would also help in the tribunals. Cheers, Danny. --- Danny Ayers <stuff> http://www.isacat.net </stuff>
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|