[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] RE: Converting non-pure trees to pure trees
> I have a XML file which I have automatically converted from > msword, the basic structure is: > > <worddocument> > <p>paragraph <b>hello</b> <i>world</i></p> > <p>paragraph <b>hello</b> <i>world</i></p> > <p>paragraph <b>hello</b> <i>world</i></p> > <pagebreak/> > <p>2/1</p> > <p>paragraph <b>hello</b> <i>world</i></p> > <p>paragraph <b>hello</b> <i>world</i></p> > <p>paragraph <b>hello</b> <i>world</i></p> > <pagebreak/> > <p>2/2</p> > <p>paragraph <b>hello</b> <i>world</i></p> > <p>paragraph <b>hello</b> <i>world</i></p> > <p>paragraph <b>hello</b> <i>world</i></p> > <worddocument/> This is a grouping problem, of the kind I call "grouping by position". Grouping problems in XSLT are not easy: for background, see www.jenitennison.com. All grouping problems require two nested loops. The outer loop selects a representative element for each group, which in this case seems to be a <p> element that is immediately preceded by a <pagebreak> element: <xsl:for-each select="p[preceding-sibling::*[1][self::pagebreak]"> <mongraph id="{.}"> ... </mongraph> </xsl:for-each> Inside this you need an inner loop that processes all the elements within one group. In this case these are "all the <p> elements that follow the "representative" element, up to the next "representative" element. Or to put it another way, all following <p> elements whose first preceding <page-break> is the same as the first preceding <page-break> of the current element. So the inner loop can be: <xsl:for-each select="following-sibling::p[ generate-id(preceding-sibling::page-break[1]) = generate-id(current()/preceding-sibling::page-break[1])]" <xsl:copy-of select="."/> </xsl:for-each> In Saxon there is a simpler solution using the saxon:leading() extension function. Mike Kay > > I wish to transform this tree using some knowledge I have > about the document: > The first page is always the "introduction", whilst all > sebsequent pages are "monographs" > > <semanticdocument> > <introduction> > <p>paragraph <b>hello</b> <i>world</i></p> > <p>paragraph <b>hello</b> <i>world</i></p> > <p>paragraph <b>hello</b> <i>world</i></p> > </introduction> > <mongraphs> > <mongraph id="2/1"> > <p>paragraph <b>hello</b> <i>world</i></p> > <p>paragraph <b>hello</b> <i>world</i></p> > <p>paragraph <b>hello</b> <i>world</i></p> > </mongraph id="2/1"> > <mongraph id="2/2"> > <p>paragraph <b>hello</b> <i>world</i></p> > <p>paragraph <b>hello</b> <i>world</i></p> > <p>paragraph <b>hello</b> <i>world</i></p> > </mongraph> > </mongraphs> > <semanticdocument/> > > XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|