[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Transforming milestone tags

Subject: Re: Transforming milestone tags
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Wed, 14 Jul 2004 12:01:49 -0400
xslt flatten structure
Dieter,

This is the topic of my paper this year at the Extreme conference. And others. Multiple concurrent hierarchies is a hot problem. If you contact me off-list, I can provide you with more info. Extreme is only three weeks away, and putting its Proceedings together is one of the things I'm working on when I'm not writing to this list. :->

Wednesday August 4 will be "Overlap Day" at Extreme this year: see the program at http://www.extrememarkup.com/extreme/2004/wednesday.asp. You may notice that no less than four of Wednesday's abstracts start with the same string (is this a case of overlap?): "Overlap in markup occurs where some markup structures do not nest...".

The short version of the story is that this is most easily done by handling the markup quite differently from the way XSLT expects to. It can be done with XSLT fairly simply (that's what my paper is on), but it's highly unorthodox. In your case, a simple approach would be to process the input in two passes, one to flatten all the markup into milestones, the next to write the flat stuff out again with the hierarchy you want.

But no guarantee even of well-formedness can be made about the output, using current tools, which is one reason why this is an interesting research area. We'd like to get to that point, but this will require implementing LMNL (http://www.lmnl.net) or something similar.

Your data looks like near-TEI. The TEI folks (who, like the OSIS project, have to deal with overlap more than a little) are watching this space. :->

Cheers,
Wendell

At 03:45 AM 7/14/2004, you wrote:
I have a source document which uses a hierarchical to markup the structure of the text of a manuscript (<div> for the big divisions and <p> for the paragraphs) and milestone tags for page breaks (<pb>) and line breaks (<lb>), which may occur in virtually any place inside the hierarchy, for example:

<doc>
  <pb n="1" />
  <div>
    <p>Line A
    <lb/>Line B
    <pb n="2" />
    <lb/>Line C
    </p>
    <p>Line D
    <lb/>Line E
    <lb/>Line F
    </p>
    <pb n="3" />
    <p>Line G
    <lb/>Line H
    <lb/>Line I
    </p>
  </div>
  <div>
    <p>Line J
    <lb/>Line K
    <lb/>Line L
    </p>
  </div>
</doc>

I would like to transform this document into a nested structure of <page> and <line> tags and markup the textual divisions as milestones:

<doc>
  <page n="1">
    <newdiv/>
    <newp/>
    <line n="1.1">Line A</line>
    <line n="1.2">Line B</line>
  </page>
  <page n="2">
    <line n="2.1">Line C</line>
    <newp/>
    <line n="2.2">Line D</line>
    <line n="2.3">Line E</line>
    <line n="2.4">Line F</line>
  </page>
  <page n="3">
    <newp/>
    <line n="3.1">Line G</line>
    <line n="3.2">Line H</line>
    <line n="3.3">Line I</line>
    <newdiv/>
    <newp/>
    <line n="3.4">Line J</line>
    <line n="3.5">Line K</line>
    <line n="3.6">Line L</line>
  </page>
</doc>

What is the best strategy to do this? (My main problem is to get a selection of nodes spanning between <pb> tags appearing on different levels in the hierarchy.)


======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.