[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Dividing documents based on size of contents

Subject: RE: Dividing documents based on size of contents
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 27 May 2009 09:12:51 +0100
RE:  Dividing documents based on size of contents
I think this is a case for "sibling recursion" - in fact, it's the example I
use on training courses, if I think the group is capable of tackling the
problem (it tends to cause significant headache, and people are typically
amazed how after 3 hours head-scratching, the answer turns out to be about
ten lines of code).

It's probably easiest to do this in two phases: the first phase copies the
documentDivision elements, inserting a <documentBreak/> element where
appropriate, and the second phase uses for-each-group
starting-with="documentBreak" to create the document elements.

The sibling recursion works like this

 <xsl:template match="documentDivision">
   <xsl:param name="size-so-far" as="xs:integer"/>
   <xsl:variable name="new-size-so-far" as="xs:integer"
                 select="$size-so-far + count(pagebreak)"/>
   <xsl:variable name="start-new-document" as="xs:boolean"
                 select="$new-size-so-far gt 100"/>
   <xsl:copy-of select="."/>
   <xsl:if test="$start-new-document">
     <documentBreak/>
   </xsl:if>
   <xsl:apply-templates select="following-sibling::documentDivision[1]">
     <xsl:with-param name="size-so-far"
          select="if ($start-new-document) then 0 else $new-size-so-far"/>
     </xsl:with-param>
   </xsl:apply-templates>
 </xsl:template> 


and then you start the process off with

 <xsl:template match="document">
   <xsl:apply-templates select="documentDivision[1]"/>
 </xsl:template>

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay 


> -----Original Message-----
> From: Chris von See [mailto:chris@xxxxxxxxxxxxx] 
> Sent: 27 May 2009 02:54
> To: xsl-list
> Subject:  Dividing documents based on size of contents
> 
> Hi all -
> 
> I have what I think is a fairly simple problem, but I'm 
> having trouble with the implementation in XSLT.  Any help you 
> could give would be greatly appreciated.
> 
> I have a document which is subdivided into multiple sections, 
> with each section, in turn, divided into pages as shown below:
> 
> <document>
> 	<documentDivision>
> 		... arbitrary content ...
> 		<pagebreak />
> 		... arbitrary content ...
> 		<pagebreak />
> 	</documentDivision>
> 
> 	... arbitrary number of <documentDivision> elements ...
> 
> </document>
> 
> Each <documentDivision> section of the document can have an 
> arbitrary number of <pagebreak> elements, and an arbitrary 
> amount of content between <pagebreak>s.
> 
> I'd like to be able to break the input <document> into 
> multiple <document>s, each of which has the minimum number of 
> <documentDivision> sections that give it a <pagebreak> count 
> ~100 pages.  I'd like to break the input at 
> <documentDivision> boundaries, but I don't need the output 
> documents to be equally sized or to be exactly 100 pages long 
> - just as close to that size as I can reasonably get while 
> maintaining the <documentDivision> boundaries.
> 
> So for example if I have an input document that looks like this:
> 
> <document>
> 	<documentDivision>
> 		... content containing 50 <pagebreak /> elements ...
> 	</documentDivision>
> 	<documentDivision>
> 		... content containing 50 <pagebreak /> elements ...
> 	</documentDivision>
> 	<documentDivision>
> 		... content containing 127 <pagebreak /> elements ...
> 	</documentDivision>
> 	<documentDivision>
> 		... content containing 5 <pagebreak /> elements ...
> 	</documentDivision>
> 	<documentDivision>
> 		... content containing 23 <pagebreak /> elements ...
> 	</documentDivision>
> 	<documentDivision>
> 		... content containing 78 <pagebreak /> elements ...
> 	</documentDivision>
> </document>
> 
> the output documents should look like this, with each output 
> document being "close" to 100 pages in length:
> 
> <!-- This doc has enough <documentDivision> elements to give 
> exactly 100 pages. --> <document>
> 	<documentDivision>
> 		... content containing 50 <pagebreak /> elements ...
> 	</documentDivision>
> 	<documentDivision>
> 		... content containing 50 <pagebreak /> elements ...
> 	</documentDivision>
> </document>
> 
> <!-- This doc has a single <documentDivision> element with 
> 127 pages - close enough! --> <document>
> 	<documentDivision>
> 		... content containing 127 <pagebreak /> elements ...
> 	</documentDivision>
> </document>
> 
> <!-- This doc has a three <documentDivision> elements of 5, 
> 23 and 78 pages each - close enough! --> <document>
> 	<documentDivision>
> 		... content containing 5 <pagebreak /> elements ...
> 	</documentDivision>
> 	<documentDivision>
> 		... content containing 23 <pagebreak /> elements ...
> 	</documentDivision>
> 	<documentDivision>
> 		... content containing 78 <pagebreak /> elements ...
> 	</documentDivision>
> </document>
> 
> I've been able to figure out how to get the number of 
> <pagebreak>s per <documentDivision> and how to calculate the 
> number of <pagebreak>s in any given group of 
> <documentDivision> sections, but what I'm not sure of is how 
> to maintain information about the point at which I last 
> created a new output document so that I can determine what 
> group of <documentDivision> elements has a page count around 
> 100 and should therefore be used to create a new output 
> document.  It seems that the best way to carry this context 
> would be via params to xsl;apply- templates, but I'm not 
> clear on how to set up the XSLT code so that the state gets 
> maintained as I iterate through <documentDivision> elements.  
> It also seems like there should be some XPath expression that 
> I can use with xsl:for-each-group, but I can't quite figure 
> out how to write that such that each group has only the 
> minimum number of <documentDivision> elements needed to 
> accumulate 100-ish pages.
> 
> Do you have any guidance on ways to do this?  I think I'm 
> just having a mental block, and a swift kick in the right 
> direction should do the trick.
> 
> 
> Thanks
> Chris

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.