[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re Whitespace
>From: Sean Mc Grath <digitome@i...> > >>Just as it's not useful in processing HTML. Regexps that don't match across >>line boundaries are the most common problem I've seen in HTML-processing >>Perl scripts. Looks like that will continue until people figure out that >>Perl's line "Feature" is just a bug when used with XML/HTML. >> > >Bang goes the the notion of a lightweigth XML app. then! Thou shalt always >parse! > >XML as a friendly format to, say, DPH needs some explaining. To use Perl to >read/write XML >you *must* use an XML parser. Indeed any tool intending to read/write XML >needs to use a >*fully blown parser* to get at the document. Bye bye the entire Unix family >of line oriented text processing apps:-( Come on, This is a crock. I've set that crytic little variable (funny that everything in Perl deserves that description) so that linend won't block regexp matches. Once that was done, I wrote a few regexps and parsed HTML just fine (It takes 1 line for a simple tag pattern match, and 10 for a loop to create a reasonably full parse into elements, content, and attribute values). I'm sure a "real" Perl programmer (unlike me) can shrink that down to 2-3 lines of triwty little characters, all of them different. XML should be no harder. My understanding of the goal for the DPH was always that XML would be no worse than HTML -- ie. for quick and dirty transformations or operations, quick and dirty parsers would work. As far as I can tell, "dirty" means that you know (or are pretty sure) they will work with one document or corpus of documents, not necessarily that they will work with any arbitrary document. If you never break tags across lines in your documents, your Perl desperation may work without worrying about this case; if you do, you have to have smarter desperation. For _reliable_ parsing of _arbitrary_ documents, you probably do need a full parser of the instance language (10 productions in the standard, or so, wasn't it?). There's no reason that that level of parsing can't be implemented within no more than 20 lines of Perl. I can't remember (or abide) the syntax of Perl enough to write it, but I'm sure there's a DPH on the list wh would love to volunteer. >>IT Sounds to me like what we really need is a small paper (about 5 >>paragraphs) explaining whitespace for developers: >> >I think this is an excellent idea! Well, I gave the three sentence version. Feel free to expand it... Acually I think the three sentences sum it up pretty well. -- David------------------------------------------+---------------------------- David Durand dgd@c...| david@d... Boston University Computer Science | Dynamic Diagrams http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@i... the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@i...)
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|