[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: stylesheet vs egrep

Subject: Re: stylesheet vs egrep
From: Trevor Nash <tcn@xxxxxxxxxxxxx>
Date: Fri, 25 Jan 2002 18:37:17 +0000
egrep white space
On Fri, 25 Jan 2002 14:07:05 +0000, you wrote:

>Hi Trevor,
>
>First many thanks for your reply. The files I am processing
>are 20megs each by the way.
>
>I tried the stylesheet and it gave me 28,792 unsorted and
>163 sorted, which was the same as my last stylesheet and
>still not the 254 given to me by egrep. My egrep command
>
>egrep "<CHARACTER_ID> [0-9]{3,6} </CHARACTER_ID>" 1.xml |sort -u | wc -l
>
>is maybe doing something strange? Heres the first 20..
>
Obvious question: does the input contain the 'missing' numbers or not
- i.e. can you find 10010 etc?
I bet you will find that here is some white space or something which
is confusing the egrep ... though I cannot explain why the unsorted
totals should be the same.  Hang on though: if your file had
          <CHARACTER_ID> 10946 </CHARACTER_ID>
and
   <CHARACTER_ID> 10946 </CHARACTER_ID>
wouldn't the egrep version count that as 2 but the XSLT version as 1
(in the XSLT version you get only the numbers, not the other junk on
the same line).

So the 10010 isn't missing from the grep version, its getting sorted
much later - what does 'sort' use for a key, isn't it the full text of
the line?

As to the size of file: if you need to tune for performance, you will
need to do it by experiment.  Adding templates to skip nodes sounds
like an obvious improvement, but the trouble is the more templates you
have the higher the cost of processing each node - which one wins
depends on the structure of the file and what processor you use.  If
you are not doing anything else in the transform you might find 
<xsl:for-each select="//CHARACTER_ID" > works best.

Regards,
Trevor Nash
--
Traditional training & distance learning,
Consultancy by email

Melvaig Software Engineering Limited
voice:     +44 (0) 1445 771 271 
email:     tcn@xxxxxxxxxxxxx

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.