[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: stylesheet vs egrep
Hi Trevor, First many thanks for your reply. The files I am processing are 20megs each by the way. I tried the stylesheet and it gave me 28,792 unsorted and 163 sorted, which was the same as my last stylesheet and still not the 254 given to me by egrep. My egrep command egrep "<CHARACTER_ID> [0-9]{3,6} </CHARACTER_ID>" 1.xml |sort -u | wc -l is maybe doing something strange? Heres the first 20.. <CHARACTER_ID> 10946 </CHARACTER_ID> <CHARACTER_ID> 11084 </CHARACTER_ID> <CHARACTER_ID> 11116 </CHARACTER_ID> <CHARACTER_ID> 11311 </CHARACTER_ID> <CHARACTER_ID> 11457 </CHARACTER_ID> <CHARACTER_ID> 12284 </CHARACTER_ID> <CHARACTER_ID> 12426 </CHARACTER_ID> <CHARACTER_ID> 12597 </CHARACTER_ID> <CHARACTER_ID> 12969 </CHARACTER_ID> <CHARACTER_ID> 13172 </CHARACTER_ID> <CHARACTER_ID> 13680 </CHARACTER_ID> <CHARACTER_ID> 13685 </CHARACTER_ID> <CHARACTER_ID> 14371 </CHARACTER_ID> <CHARACTER_ID> 16142 </CHARACTER_ID> <CHARACTER_ID> 16783 </CHARACTER_ID> <CHARACTER_ID> 16851 </CHARACTER_ID> <CHARACTER_ID> 17443 </CHARACTER_ID> <CHARACTER_ID> 17583 </CHARACTER_ID> <CHARACTER_ID> 17933 </CHARACTER_ID> <CHARACTER_ID> 17958 </CHARACTER_ID> And the first 20 of your stylesheet... 10010 10347 10904 10946 11084 11116 11237 11311 11457 12284 12426 12597 12599 12969 13172 13680 13685 14211 14371 14791 so there are numbers in the stylesheet that egrep is missing e.g the top 3, but still produces less....!? Mystery.. Any one? Ahmad Ahmad Trevor Nash wrote: > > On Fri, 25 Jan 2002 11:35:49 +0000, Ahmad J Reeves wrote: > > >Hi there, > > > >I have xml files that contain 4 types of tags, > >direct,local,global and admin in varying numbers > > >I need to get a list of all the character_id's, and then > >remove the duplicates and count them. With the following > >stylesheet, > >[snip] > >Is it my stylesheet thats lying, or my egrep ? > > > The stylesheet, because you are forgetting the built-in templates. > This means two things: > 1. the default is to copy text nodes to the output: some of these are > numbers, hence the strange results. > 2. you are doing much more work than is necessary, since most of your > templates are just visiting children, which is what the default does > anyway. > > Try this: > <xsl:stylesheet > xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > version="1.0"> > <xsl:output method="text"/> > > <xsl:variable name="NL" select="'
'"/> > > <xsl:template match="CHARACTER_ID"> > > <xsl:value-of select="."/> > <xsl:value-of select="$NL"/> > > </xsl:template> > > <!-- throw away all text nodes --> > <xsl:template match="text()" /> > > </xsl:stylesheet> > > The only reason for putting other templates in would be to avoid > traversing bits of the document where you know there are no > CHARACTER_ID nodes, which might make the transform a bit faster. > Unless the input document is huge this isn't likely to make much > difference, and of course it makes it more prone to bugs. > > Regards > Trevor Nash > -- > Traditional training & distance learning, > Consultancy by email > > Melvaig Software Engineering Limited > voice: +44 (0) 1445 771 271 > email: tcn@xxxxxxxxxxxxx > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|