[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Using Tidy for XML correction
Linda asked what to do to get documents like the following example to vaguely resemble XML: > <p> > <list> > <listitem> > <courier> > Some text > </courier> > </p> If the problems in the "XML"-files are really like this one, I'd write a small program(s) to fix things, and rush on. For this, I might consider taking some HTML parser, which usually accept somewhat broken texts (I guess at least Perl has something like that already) and read in text, process and output them. Large number of files seems to indicate they're quite small so you can load them into the memory as one piece, which eases processing even more. And when you output what's parsed, just delete or add tags, or do what's needed. Or maybe you'd like to go even more brutal and effective way for very simple cases and apply some regular expressions or some other neat small hacks to get around. The nice thing is that the broken files are produced by a program, so they're probably systematically broken; not insanely broken like humans tend to do. - Aleksi
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|