[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Dividing documents based on size of contents

Subject: Dividing documents based on size of contents
From: Chris von See <chris@xxxxxxxxxxxxx>
Date: Tue, 26 May 2009 18:54:15 -0700
 Dividing documents based on size of contents
Hi all -

I have what I think is a fairly simple problem, but I'm having trouble with the implementation in XSLT. Any help you could give would be greatly appreciated.

I have a document which is subdivided into multiple sections, with each section, in turn, divided into pages as shown below:

<document>
	<documentDivision>
		... arbitrary content ...
		<pagebreak />
		... arbitrary content ...
		<pagebreak />
	</documentDivision>

... arbitrary number of <documentDivision> elements ...

</document>

Each <documentDivision> section of the document can have an arbitrary number of <pagebreak> elements, and an arbitrary amount of content between <pagebreak>s.

I'd like to be able to break the input <document> into multiple <document>s, each of which has the minimum number of <documentDivision> sections that give it a <pagebreak> count ~100 pages. I'd like to break the input at <documentDivision> boundaries, but I don't need the output documents to be equally sized or to be exactly 100 pages long - just as close to that size as I can reasonably get while maintaining the <documentDivision> boundaries.

So for example if I have an input document that looks like this:

<document>
	<documentDivision>
		... content containing 50 <pagebreak /> elements ...
	</documentDivision>
	<documentDivision>
		... content containing 50 <pagebreak /> elements ...
	</documentDivision>
	<documentDivision>
		... content containing 127 <pagebreak /> elements ...
	</documentDivision>
	<documentDivision>
		... content containing 5 <pagebreak /> elements ...
	</documentDivision>
	<documentDivision>
		... content containing 23 <pagebreak /> elements ...
	</documentDivision>
	<documentDivision>
		... content containing 78 <pagebreak /> elements ...
	</documentDivision>
</document>

the output documents should look like this, with each output document being "close" to 100 pages in length:

<!-- This doc has enough <documentDivision> elements to give exactly 100 pages. -->
<document>
<documentDivision>
... content containing 50 <pagebreak /> elements ...
</documentDivision>
<documentDivision>
... content containing 50 <pagebreak /> elements ...
</documentDivision>
</document>


<!-- This doc has a single <documentDivision> element with 127 pages - close enough! -->
<document>
<documentDivision>
... content containing 127 <pagebreak /> elements ...
</documentDivision>
</document>


<!-- This doc has a three <documentDivision> elements of 5, 23 and 78 pages each - close enough! -->
<document>
<documentDivision>
... content containing 5 <pagebreak /> elements ...
</documentDivision>
<documentDivision>
... content containing 23 <pagebreak /> elements ...
</documentDivision>
<documentDivision>
... content containing 78 <pagebreak /> elements ...
</documentDivision>
</document>


I've been able to figure out how to get the number of <pagebreak>s per <documentDivision> and how to calculate the number of <pagebreak>s in any given group of <documentDivision> sections, but what I'm not sure of is how to maintain information about the point at which I last created a new output document so that I can determine what group of <documentDivision> elements has a page count around 100 and should therefore be used to create a new output document. It seems that the best way to carry this context would be via params to xsl;apply- templates, but I'm not clear on how to set up the XSLT code so that the state gets maintained as I iterate through <documentDivision> elements. It also seems like there should be some XPath expression that I can use with xsl:for-each-group, but I can't quite figure out how to write that such that each group has only the minimum number of <documentDivision> elements needed to accumulate 100-ish pages.

Do you have any guidance on ways to do this? I think I'm just having a mental block, and a swift kick in the right direction should do the trick.


Thanks Chris

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.