|
[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Collating riffled lists
This is my first XSLT project. I have a recursive solution to a problem which I hope one of you can improve on. This an abstraction of a problem that arose in the context of scraping PDF docs. PDF->Adobe->HTML->tidy->XML->scrape with XSLT->... The PDF->HTML conversion, or for that matter, lassoing the text in Acrobat Reader, cutting and pasting it, yields a different order than what is displayed on screen by Acrobat Reader. It's not so badly mangled that it can't be recovered. However, related items are no longer near one another. I need to recover the original relationship between the related items. I'm hoping someone can come up with a better solution that the one I present below, which I believe is O(n squared), where n is large (the original document is 170+ pages). I've considered outputting the a's and b's into two result files with two xslt programs and processing those. I think XSLT 1.1 would allow this to be done within a single xslt program by building two node sets, but I'd like to stick to 1.0, if possible. XML source: <?xml version="1.0"?> <list> <a>a1</a> <a>a2</a> <a>a3</a> <b>b1</b> <b>b2</b> <a>a4</a> <b>b3</b> <a>a5</a> <b>b4</b> <a>a6</a> <b>b5</b> <b>b6</b> </list> The XSLT: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <!-- match first a and make top level call to recursive template --> <xsl:template match="a[1]"> <xsl:call-template name="do_a"> <xsl:with-param name="ix" select="1" /> </xsl:call-template> </xsl:template> <!-- recursive template counts a's --> <xsl:template name="do_a"> <xsl:param name="ix" /> <!-- output this a --> <xsl:text> </xsl:text> <xsl:value-of select="$ix" /><xsl:text>: </xsl:text><xsl:value-of select="."/> <!-- output corresponding b --> <xsl:text> </xsl:text><xsl:copy-of select="/list/b[$ix]/text()" /> <!-- This for-each moves to the next a; doesn't loop. --> <xsl:for-each select="following-sibling::a[1]"> <!-- increment counter and output rest of a's --> <xsl:call-template name="do_a"> <xsl:with-param name="ix" select="$ix+1" /> </xsl:call-template> </xsl:for-each> </xsl:template> <!-- suppress other output --> <xsl:template match="text()" /> </xsl:stylesheet> And this is the output (Saxon 6.5.1 with XFactor GUI): <?xml version="1.0" encoding="utf-8"?> 1: a1 b1 2: a2 b2 3: a3 b3 4: a4 b4 5: a5 b5 6: a6 b6 Thanks, Mat M. XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|






