-----Original Message-----
From: Chris von See [mailto:chris@xxxxxxxxxxxxx]
Sent: 27 May 2009 02:54
To: xsl-list
Subject: Dividing documents based on size of contents
Hi all -
I have what I think is a fairly simple problem, but I'm
having trouble with the implementation in XSLT. Any help you
could give would be greatly appreciated.
I have a document which is subdivided into multiple sections,
with each section, in turn, divided into pages as shown below:
<document>
<documentDivision>
... arbitrary content ...
<pagebreak />
... arbitrary content ...
<pagebreak />
</documentDivision>
... arbitrary number of <documentDivision> elements ...
</document>
Each <documentDivision> section of the document can have an
arbitrary number of <pagebreak> elements, and an arbitrary
amount of content between <pagebreak>s.
I'd like to be able to break the input <document> into
multiple <document>s, each of which has the minimum number of
<documentDivision> sections that give it a <pagebreak> count
~100 pages. I'd like to break the input at
<documentDivision> boundaries, but I don't need the output
documents to be equally sized or to be exactly 100 pages long
- just as close to that size as I can reasonably get while
maintaining the <documentDivision> boundaries.
So for example if I have an input document that looks like this:
<document>
<documentDivision>
... content containing 50 <pagebreak /> elements ...
</documentDivision>
<documentDivision>
... content containing 50 <pagebreak /> elements ...
</documentDivision>
<documentDivision>
... content containing 127 <pagebreak /> elements ...
</documentDivision>
<documentDivision>
... content containing 5 <pagebreak /> elements ...
</documentDivision>
<documentDivision>
... content containing 23 <pagebreak /> elements ...
</documentDivision>
<documentDivision>
... content containing 78 <pagebreak /> elements ...
</documentDivision>
</document>
the output documents should look like this, with each output
document being "close" to 100 pages in length:
<!-- This doc has enough <documentDivision> elements to give
exactly 100 pages. --> <document>
<documentDivision>
... content containing 50 <pagebreak /> elements ...
</documentDivision>
<documentDivision>
... content containing 50 <pagebreak /> elements ...
</documentDivision>
</document>
<!-- This doc has a single <documentDivision> element with
127 pages - close enough! --> <document>
<documentDivision>
... content containing 127 <pagebreak /> elements ...
</documentDivision>
</document>
<!-- This doc has a three <documentDivision> elements of 5,
23 and 78 pages each - close enough! --> <document>
<documentDivision>
... content containing 5 <pagebreak /> elements ...
</documentDivision>
<documentDivision>
... content containing 23 <pagebreak /> elements ...
</documentDivision>
<documentDivision>
... content containing 78 <pagebreak /> elements ...
</documentDivision>
</document>
I've been able to figure out how to get the number of
<pagebreak>s per <documentDivision> and how to calculate the
number of <pagebreak>s in any given group of
<documentDivision> sections, but what I'm not sure of is how
to maintain information about the point at which I last
created a new output document so that I can determine what
group of <documentDivision> elements has a page count around
100 and should therefore be used to create a new output
document. It seems that the best way to carry this context
would be via params to xsl;apply- templates, but I'm not
clear on how to set up the XSLT code so that the state gets
maintained as I iterate through <documentDivision> elements.
It also seems like there should be some XPath expression that
I can use with xsl:for-each-group, but I can't quite figure
out how to write that such that each group has only the
minimum number of <documentDivision> elements needed to
accumulate 100-ish pages.
Do you have any guidance on ways to do this? I think I'm
just having a mental block, and a swift kick in the right
direction should do the trick.
Thanks
Chris