[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: deduplicating information in XML files
At 2012-10-12 14:02 +0200, Robby Pelssers wrote:
Hi all, I hope the complete solution below in XSLT is helpful. I see that Wendell posted while I was working on this, and I like his idea of using the collection() function rather than my hardwired map of maps. I'll leave that with you as an exercise. You can also tweak the file name generation as you need. Oh, and I also added some additional data. I was really curious about this solution. In the classroom I teach the three methods of grouping in XSLT 1: by axes, by keys and by variables. When I talk about XSLT 2 I claim (or used to claim!) that these methods were no longer needed. But ... I had to use the variable method in XSLT 2 in order to solve your requirement! So I'll have to change my classroom materials to reflect this. The reason I had to use the variable-based grouping method is that the XSLT 2 <xsl:for-each-group>'s group-by= attribute is based on the value calculated, not on the structure. I had to use deep-equal() in order to determine if the structure was the same. So that ruled out <xsl:for-each-group>. So I instantly turned to the XSLT 1 variable-based method in order to work across documents with an arbitrary calculation of equality, knowing that the shape of the solution would give me what I wanted. I think this is directly translatable to XQuery, and so I will post such a solution to that list. Good luck! . . . . . . . . Ken t:\ftemp\robby>type robby.xml <?xml version="1.0" encoding="UTF-8"?> <maps> <map href="Product1_map.xml"/> <map href="Product2_map.xml"/> <map href="Product3_map.xml"/> <map href="Product4_map.xml"/> <map href="Product5_map.xml"/> </maps> t:\ftemp\robby>type Product1_map.xml <map> <features-benefits-ref href="features-benefits/Product1_FandB.xml"/> </map> t:\ftemp\robby>type Product2_map.xml <map> <features-benefits-ref href="features-benefits/Product2_FandB.xml"/> </map> t:\ftemp\robby>type Product3_map.xml <map> <features-benefits-ref href="features-benefits/Product3_FandB.xml"/> </map> t:\ftemp\robby>type Product4_map.xml <map> <features-benefits-ref href="features-benefits/Product4_FandB.xml"/> </map> t:\ftemp\robby>type Product5_map.xml <map> <features-benefits-ref href="features-benefits/Product5_FandB.xml"/> </map> t:\ftemp\robby>dir /s features-benefits Volume in drive T is VBOX_t Volume Serial Number is 0E00-0002 Directory of t:\ftemp\robby\features-benefits 2012-10-12 08:37 235 Product1_FandB.xml 2012-10-12 08:37 235 Product2_FandB.xml 2012-10-12 08:38 286 Product3_FandB.xml 2012-10-12 08:38 285 Product4_FandB.xml 2012-10-12 08:38 285 Product5_FandB.xml 5 File(s) 1,326 bytes Total Files Listed: 5 File(s) 1,326 bytes 0 Dir(s) 16,795,488,256 bytes free t:\ftemp\robby>type features-benefits\Product1_FandB.xml <content> <meta> <id>product1</id> </meta> <body> <p>Suitable for high frequency applications due to fast switching characteristics</p> <p>Suitable for logic level gate drive sources</p> </body> </content> t:\ftemp\robby>type features-benefits\Product2_FandB.xml <content> <meta> <id>product2</id> </meta> <body> <p>Suitable for high frequency applications due to fast switching characteristics</p> <p>Suitable for logic level gate drive sources</p> </body> </content> t:\ftemp\robby>type features-benefits\Product3_FandB.xml <content> <meta> <id>product3</id> </meta> <body> <p>Suitable for high frequency applications due to fast switching characteristics</p> <p>Suitable for logic level gate drive sources</p> <p>With additional text that is different</p> </body> </content> t:\ftemp\robby>type features-benefits\Product4_FandB.xml <content> <meta> <id>product4</id> </meta> <body> <p>Suitable for high frequency applications due to fast switching characteristics</p> <p>Suitable for logic level gate drive sources</p> <p>With additional text that is the same</p> </body> </content> t:\ftemp\robby>type features-benefits\Product5_FandB.xml <content> <meta> <id>product5</id> </meta> <body> <p>Suitable for high frequency applications due to fast switching characteristics</p> <p>Suitable for logic level gate drive sources</p> <p>With additional text that is the same</p> </body> </content> t:\ftemp\robby>call xslt2 robby.xml robby.xsl out\robbyout.xml t:\ftemp\robby>dir \s out Volume in drive T is VBOX_t Volume Serial Number is 0E00-0002 Directory of t:\ Directory of t:\ftemp\robby\out 2012-10-12 10:02 <DIR> features-benefits 2012-10-12 10:14 94 Product1_map.xml 2012-10-12 10:14 94 Product2_map.xml 2012-10-12 10:14 84 Product3_map.xml 2012-10-12 10:14 94 Product4_map.xml 2012-10-12 10:14 94 Product5_map.xml 2012-10-12 10:14 371 robbyout.xml 6 File(s) 1,001 bytes 1 Dir(s) 16,795,488,256 bytes free t:\ftemp\robby>type out\robbyout.xml <?xml version="1.0" encoding="UTF-8"?> <maps><!--features-benefits/Product1_FandB.xml.group.xml--> <map href="Product1_map.xml"/> <map href="Product2_map.xml"/> <!--features-benefits/Product3_FandB.xml--> <map href="Product3_map.xml"/> <!--features-benefits/Product4_FandB.xml.group.xml--> <map href="Product4_map.xml"/> <map href="Product5_map.xml"/> </maps> t:\ftemp\robby>type out\Product1_map.xml <map> <features-benefits-ref href="features-benefits/Product1_FandB.xml.group.xml"/> </map> t:\ftemp\robby>type out\Product2_map.xml <map> <features-benefits-ref href="features-benefits/Product1_FandB.xml.group.xml"/> </map> t:\ftemp\robby>type out\Product3_map.xml <map> <features-benefits-ref href="features-benefits/Product3_FandB.xml"/> </map> t:\ftemp\robby>type out\Product4_map.xml <map> <features-benefits-ref href="features-benefits/Product4_FandB.xml.group.xml"/> </map> t:\ftemp\robby>type out\Product5_map.xml <map> <features-benefits-ref href="features-benefits/Product4_FandB.xml.group.xml"/> </map> t:\ftemp\robby>type out\features-benefits\Product1_FandB.xml.group.xml <content> <meta> <id> <!-- - features-benefits/Product1_FandB.xml--> <!-- - features-benefits/Product2_FandB.xml--> </id> </meta> <body> <p>Suitable for high frequency applications due to fast switching characteristics</p> <p>Suitable for logic level gate drive sources</p> </body> </content> t:\ftemp\robby>type out\features-benefits\Product3_FandB.xml <content> <meta> <id/> </meta> <body> <p>Suitable for high frequency applications due to fast switching characteristics</p> <p>Suitable for logic level gate drive sources</p> <p>With additional text that is different</p> </body> </content> t:\ftemp\robby>type out\features-benefits\Product4_FandB.xml.group.xml <content> <meta> <id> <!-- - features-benefits/Product4_FandB.xml--> <!-- - features-benefits/Product5_FandB.xml--> </id> </meta> <body> <p>Suitable for high frequency applications due to fast switching characteristics</p> <p>Suitable for logic level gate drive sources</p> <p>With additional text that is the same</p> </body> </content> t:\ftemp\robby>type robby.xsl <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output indent="yes"/> <xsl:template match="maps"> <xsl:variable name="maps" select="map"/> <!--walk across all maps, acting on the first one that has unique content--> <maps> <xsl:for-each select="$maps"> <xsl:variable name="map-href" select="@href"/> <!-- <xsl:message select="$map-href"/> <xsl:message select="generate-id(doc(doc(@href)/*/features-benefits-ref/@href))"/> <xsl:message select="count( $maps[deep-equal(doc(doc(@href)/*/features-benefits-ref/@href)/*/body, doc(doc(current()/@href)/*/features-benefits-ref/@href)/*/body)])"/> <xsl:message select=" $maps[deep-equal(doc(doc(@href)/*/features-benefits-ref/@href)/*/body, doc(doc(current()/@href)/*/features-benefits-ref/@href)/*/body)]/generate-id(.)"/> --> <xsl:if test="generate-id(.)=generate-id ($maps[deep-equal(doc(doc(@href)/*/features-benefits-ref/@href)/*/body, doc(doc(current()/@href)/*/features-benefits-ref/@href)/*/body)][1])"> <!--found the first one of the group with this body content--> <xsl:variable name="current-group" select="$maps[ deep-equal(doc(doc(@href)/*/features-benefits-ref/@href)/*/body, doc(doc(current()/@href)/*/features-benefits-ref/@href)/*/body)]"/> <xsl:variable name="count-current-group" select="count($current-group)"/> <xsl:variable name="new-file-href" select="concat(doc($map-href)/*/features-benefits-ref/@href, if( $count-current-group=1 ) then '' else '.group.xml' )"/> <!--just for information, note this in the result map of maps--> <xsl:comment select="$new-file-href"/><xsl:text>
</xsl:text> <xsl:for-each select="$current-group"> <!--reference the map file--> <map href="{@href}"/> <!--recreate the map file--> <xsl:result-document href="{@href}" omit-xml-declaration="yes"> <map> <features-benefits-ref href="{$new-file-href}"/> </map> </xsl:result-document> </xsl:for-each> <!--recreate the content file--> <xsl:result-document href="{$new-file-href}" omit-xml-declaration="yes"> <content> <meta> <id> <xsl:choose> <xsl:when test="$count-current-group=1"> <xsl:copy-of select="node()"/> </xsl:when> <xsl:otherwise> <xsl:for-each select="$current-group"> <xsl:text>
</xsl:text> <xsl:comment select="string(.), '-',doc(@href)/*/features-benefits-ref/@href"/> </xsl:for-each> <xsl:text>
</xsl:text> </xsl:otherwise> </xsl:choose> </id> </meta> <xsl:copy-of select="doc(doc(@href)/*/features-benefits-ref/@href)/*/body"/> </content> </xsl:result-document> </xsl:if> </xsl:for-each> </maps> </xsl:template> </xsl:stylesheet> -- Contact us for world-wide XML consulting and instructor-led training Free 5-hour lecture: http://www.CraneSoftwrights.com/links/udemy.htm Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/ G. Ken Holman mailto:gkholman@xxxxxxxxxxxxxxxxxxxx Google+ profile: https://plus.google.com/116832879756988317389/about Legal business disclaimers: http://www.CraneSoftwrights.com/legal
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|