[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Complex splitting of XML tag to multiple other XML tag

Subject: Complex splitting of XML tag to multiple other XML tags using XSLT
From: Lars Eskildsen <laes@xxxxxxxxx>
Date: Sun, 20 Oct 2002 15:43:37 +0200
ad tag
Hello XSLT experts!!

We recieve XML files from one of our customers and then 
transform it into our own XML format using 
XSLT 1.0 (and Xalan 1.3), but we have a specific problem:

----------
We have the following DTD snippet (for the customer XML):

<!ELEMENT ADLIST (head, lines)>
<!ELEMENT head (#PCDATA)>
<!ELEMENT lines (TeleLine, InetLine)+>
<!ELEMENT TeleLine ( text1?, texte2? )>
<!ELEMENT InetLine (#PCDATA)>
<!ELEMENT text1 (#PCDATA)>
<!ELEMENT text2 (#PCDATA)>

In general we want to use XSLT to convert ONE <ADLIST> tag
to ONE <AD> tag, where our own DTD for the <AD> tag is 
the following:

<!ELEMENT AD (head?, lines)>
<!ATTLIST AD SEQ CDATA (U|S|M|E) #REQUIRED>
<!ELEMENT head (#PCDATA)>
<!ELEMENT lines (TeleLine, InetLine)+>
<!ELEMENT TeleLine ( text1?, texte2? )>
<!ELEMENT InetLine (#PCDATA)>
<!ELEMENT text1 (#PCDATA)>
<!ELEMENT text2 (#PCDATA)>

In doing the one-to-one conversion, we set the SEQ attribute 
to the value 'U' (undefined). 
The one-to-one conversion is NOT a problem!
----------

In certain circumstances we want to convert an <ADLIST> tag 
to several <AD> tags, using the SEQ attribute to reflect 
the sequence of the <AD> tags in relation to the 
original <ADLIST>.
The semantics of this atrribute is 'S' for Start, 
'M' for Middle and 'E' for End.

The rules for splitting the original <ADLIST> tag into 
several <AD> tags, is as follows:

1) The <ADLIST> tag must contain:
    a) more than one <TeleLine> tag and at least one 
       <InetLine> tag or
    b) more than one <InetLine> tag and at least one 
       <TeleLine> tag

2) The <ADLIST> tag MUST contain a <TeleLine> tag that 
   contains a <text1> tag and is NOT the first <TeleLine> 
   tag.

3) The <ADLIST> tag must be split at <TeleLine> tags that 
   contains an <text1> tag.

When doing the split, we have to obey the following:

i)   The first <AD> tag contains at LEAST one <TeleLine>
     and at LEAST one <TeleLine>, NOT more than one of both.
     Furthermore only the first <AD> tag contains the 
     <head> tag from the original XML and this <AD> tag 
     should have the SEQ attribute set to 'S'.

ii)  The last <AD> tag contains the LAST <TeleLine> tag with 
     a <text1> tag (and eventual <InetLine> and/or <TeleLine> 
     with NO <text1> tag that follows).
     The last <AD> tag should have the SEQ attribute set to 'E'.

iii) Medium <AD> tags (between the first and the last) should 
     be generated for each NOT LAST <TeleLine> tags that 
     contains a <text1> tag.
     These <AD> tags should have the SEQ attribute set to 'M'.

----------

Sometimes (maybe always) an example says more than a 
1000 specification words, so heres an example:

<ADLIST>
  <head>Head Text</head>
  <lines>
    <TeleLine>
       <text2>TTT1</text2>
    </TeleLine>
    <TeleLine>
       <text1>TTT2</text1>
    </TeleLine>
    <InetLine>III1</InetLine>
    <InetLine>III2</InetLine>
    <TeleLine>
       <text2>TTT3</text2>
    </TeleLine>
    <TeleLine>
       <text1>TTT4</text1>
    </TeleLine>
    <InetLine>III3</InetLine>
    <TeleLine>
      <text1>TTT5</text1>
    </TeleLine>
    <InetLine>III4</InetLine>
    <TeleLine>
      <text1>TTT6</text1>
    </TeleLine>
    <TeleLine>
      <text2>TTT7</text2>
    </TeleLine>
  </lines>
</ADLIST>

Should be converted to the following sequence of <AD> tags:

<AD SEQ="S">
  <head>Head Text</head>
  <lines>
    <TeleLine>
       <text2>TTT1</text2>
    </TeleLine>
    <TeleLine>
       <text1>TTT2</text1>
    </TeleLine>
    <InetLine>III1</InetLine>
  </lines>
</AD>

<AD SEQ="M">
  <lines>
    <InetLine>III2</InetLine>
    <TeleLine>
       <text2>TTT3</text2>
    </TeleLine>
    <TeleLine>
       <text1>TTT4</text1>
    </TeleLine>
  </lines>
</AD>

<AD SEQ="M">
  <lines>   
    <InetLine>III3</InetLine>
    <TeleLine>
      <text1>TTT5</text1>
    </TeleLine>
  </lines>
</AD>

<AD SEQ="E">
  <lines>  
    <InetLine>III4</InetLine>
    <TeleLine>
      <text1>TTT6</text1>
    </TeleLine>
    <TeleLine>
      <text2>TTT7</text2>
    </TeleLine>
  </lines>
</AD>
-------

I suppose the solution requires some elaborate use 
of the <xsl:key> tag, but i just cant seem to figure 
it out (believe me - i have tried)!

If anyone out there can help, i would REALLY appreciate 
it (and even buy that someone some excellent danish beer, 
if he or she should ever visit Aarhus in Denmark)!

/Lars

** Stibo Graphic          | Søren Nymarks Vej 21 | DK-8270 Højbjerg 
** mailto:laes@xxxxxxxxx  | http://www.stibographic.com 
** Phone:  +45 8939 8939  | Fax:    +45 8939 8940
** Direct: +45 8939 7421


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.