[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Dividing documents based on size of contents

Subject: Re: Dividing documents based on size of contents
From: Chris von See <chris@xxxxxxxxxxxxx>
Date: Wed, 27 May 2009 12:09:49 -0700
Re:  Dividing documents based on size of contents
Thanks to Emmanuel and Michael for the great answers. I modified my code to use Michael's solution (using one pass through the source instead of two) and it seems to be working.

Cheers
Chris

On May 27, 2009, at 1:12 AM, Michael Kay wrote:


I think this is a case for "sibling recursion" - in fact, it's the example I
use on training courses, if I think the group is capable of tackling the
problem (it tends to cause significant headache, and people are typically
amazed how after 3 hours head-scratching, the answer turns out to be about
ten lines of code).


It's probably easiest to do this in two phases: the first phase copies the
documentDivision elements, inserting a <documentBreak/> element where
appropriate, and the second phase uses for-each-group
starting-with="documentBreak" to create the document elements.


The sibling recursion works like this

<xsl:template match="documentDivision">
<xsl:param name="size-so-far" as="xs:integer"/>
<xsl:variable name="new-size-so-far" as="xs:integer"
select="$size-so-far + count(pagebreak)"/>
<xsl:variable name="start-new-document" as="xs:boolean"
select="$new-size-so-far gt 100"/>
<xsl:copy-of select="."/>
<xsl:if test="$start-new-document">
<documentBreak/>
</xsl:if>
<xsl:apply-templates select="following- sibling::documentDivision[1]">
<xsl:with-param name="size-so-far"
select="if ($start-new-document) then 0 else $new-size-so- far"/>
</xsl:with-param>
</xsl:apply-templates>
</xsl:template>



and then you start the process off with


<xsl:template match="document">
  <xsl:apply-templates select="documentDivision[1]"/>
</xsl:template>

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay


-----Original Message-----
From: Chris von See [mailto:chris@xxxxxxxxxxxxx]
Sent: 27 May 2009 02:54
To: xsl-list
Subject:  Dividing documents based on size of contents

Hi all -

I have what I think is a fairly simple problem, but I'm
having trouble with the implementation in XSLT.  Any help you
could give would be greatly appreciated.

I have a document which is subdivided into multiple sections,
with each section, in turn, divided into pages as shown below:

<document>
	<documentDivision>
		... arbitrary content ...
		<pagebreak />
		... arbitrary content ...
		<pagebreak />
	</documentDivision>

... arbitrary number of <documentDivision> elements ...

</document>

Each <documentDivision> section of the document can have an
arbitrary number of <pagebreak> elements, and an arbitrary
amount of content between <pagebreak>s.

I'd like to be able to break the input <document> into
multiple <document>s, each of which has the minimum number of
<documentDivision> sections that give it a <pagebreak> count
~100 pages.  I'd like to break the input at
<documentDivision> boundaries, but I don't need the output
documents to be equally sized or to be exactly 100 pages long
- just as close to that size as I can reasonably get while
maintaining the <documentDivision> boundaries.

So for example if I have an input document that looks like this:

<document>
	<documentDivision>
		... content containing 50 <pagebreak /> elements ...
	</documentDivision>
	<documentDivision>
		... content containing 50 <pagebreak /> elements ...
	</documentDivision>
	<documentDivision>
		... content containing 127 <pagebreak /> elements ...
	</documentDivision>
	<documentDivision>
		... content containing 5 <pagebreak /> elements ...
	</documentDivision>
	<documentDivision>
		... content containing 23 <pagebreak /> elements ...
	</documentDivision>
	<documentDivision>
		... content containing 78 <pagebreak /> elements ...
	</documentDivision>
</document>

the output documents should look like this, with each output
document being "close" to 100 pages in length:

<!-- This doc has enough <documentDivision> elements to give
exactly 100 pages. --> <document>
	<documentDivision>
		... content containing 50 <pagebreak /> elements ...
	</documentDivision>
	<documentDivision>
		... content containing 50 <pagebreak /> elements ...
	</documentDivision>
</document>

<!-- This doc has a single <documentDivision> element with
127 pages - close enough! --> <document>
	<documentDivision>
		... content containing 127 <pagebreak /> elements ...
	</documentDivision>
</document>

<!-- This doc has a three <documentDivision> elements of 5,
23 and 78 pages each - close enough! --> <document>
	<documentDivision>
		... content containing 5 <pagebreak /> elements ...
	</documentDivision>
	<documentDivision>
		... content containing 23 <pagebreak /> elements ...
	</documentDivision>
	<documentDivision>
		... content containing 78 <pagebreak /> elements ...
	</documentDivision>
</document>

I've been able to figure out how to get the number of
<pagebreak>s per <documentDivision> and how to calculate the
number of <pagebreak>s in any given group of
<documentDivision> sections, but what I'm not sure of is how
to maintain information about the point at which I last
created a new output document so that I can determine what
group of <documentDivision> elements has a page count around
100 and should therefore be used to create a new output
document.  It seems that the best way to carry this context
would be via params to xsl;apply- templates, but I'm not
clear on how to set up the XSLT code so that the state gets
maintained as I iterate through <documentDivision> elements.
It also seems like there should be some XPath expression that
I can use with xsl:for-each-group, but I can't quite figure
out how to write that such that each group has only the
minimum number of <documentDivision> elements needed to
accumulate 100-ish pages.

Do you have any guidance on ways to do this?  I think I'm
just having a mental block, and a swift kick in the right
direction should do the trick.


Thanks Chris


Chris von See Senior Geek TechAdapt, Inc. 2910 Heights Dr. Bellingham, WA 98226

E: chris@xxxxxxxxxxxxx
P: +1 360 223 1514
F: +1 360 544 0112

Save trees. Print only when necessary.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.