[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Frontline report from the Desperate Fgrep Hacker

  • From: John Cowan <cowan@l...>
  • To: xml-dev@x...
  • Date: Thu, 10 Feb 100 22:45:04 -0500 (EST)

fgrep subdirectory
I was talking to my boss today about processing a bunch of
XML documents to act on the value of a certain element whose
model is #PCDATA.  I had a long list of 'good' values, perhaps
5000 out of 100,000 possible values, and the question is
"Which documents have good values?"

I happened to know that the element always appeared on a single
line of the file: the start tag, the character data, the end tag.
Furthermore, the content was syntactically distinct from the
rest of the file: it had the form X(X)-NNNNN(X), which did not appea
elsewhere.

I therefore proposed preprocessing the 5000 good values into
elements, and using the GNU "fgrep" program to search the
dcouments for matches.

My boss goggled.  "No XML parser?  Won't they throw you out of the
XML Union for that?"

"Not at all!" said I.  "XML is a data (or document) representation
standard.  It does *not* dictate a particular processing model!
If it's both efficient and (sufficiently) reliable to use
a fast, stupid processing model in a particular case, nothing
in the XML environment prohibits it."

The following shell script (with some decorations) did the trick:

	cp `fgrep -l -f goodvalues.xml *` winners

which copies all files containing any of the values in "goodvalues" to the
subdirectory "winners".  Lightning fast, totally accurate.

Just another Fgrep hacker,

-- 
John Cowan                                   cowan@c...
       I am a member of a civilization. --David Brin

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.