[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Frontline report from the Desperate Fgrep Hacker
I was talking to my boss today about processing a bunch of XML documents to act on the value of a certain element whose model is #PCDATA. I had a long list of 'good' values, perhaps 5000 out of 100,000 possible values, and the question is "Which documents have good values?" I happened to know that the element always appeared on a single line of the file: the start tag, the character data, the end tag. Furthermore, the content was syntactically distinct from the rest of the file: it had the form X(X)-NNNNN(X), which did not appea elsewhere. I therefore proposed preprocessing the 5000 good values into elements, and using the GNU "fgrep" program to search the dcouments for matches. My boss goggled. "No XML parser? Won't they throw you out of the XML Union for that?" "Not at all!" said I. "XML is a data (or document) representation standard. It does *not* dictate a particular processing model! If it's both efficient and (sufficiently) reliable to use a fast, stupid processing model in a particular case, nothing in the XML environment prohibits it." The following shell script (with some decorations) did the trick: cp `fgrep -l -f goodvalues.xml *` winners which copies all files containing any of the values in "goodvalues" to the subdirectory "winners". Lightning fast, totally accurate. Just another Fgrep hacker, -- John Cowan cowan@c... I am a member of a civilization. --David Brin
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|