[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Fw: Question on duplicate node elimination

Subject: Re: Fw: Question on duplicate node elimination
From: Hermann Stamm-Wilbrandt <STAMMW@xxxxxxxxxx>
Date: Tue, 24 Aug 2010 20:52:20 +0200
Re: Fw:  Question on duplicate node elimination
Michael,

>   I haven't understood your logic in any detail, but I wonder if it
> suggests an alternative approach to the problem: namely, avoid creating
> RTFs entirely, at least for intermediate results. Instead, whenever you
> are evaluating an operation that returns a node-set, represent that
> node-set as a string containing the generate-id values of the nodes in
> the node-set, space-separated. Elimination of duplicates then reduces to
> an operation on strings: not trivial, but not especially difficult
either.

that is a cool idea.

And reading your suggestion of white space separated list of stings I
thought on the id() function.

This function can do the duplicate elimination "for free" !

Having a document with DOCTYPE/ID and a white space separated string of
id's the call of id() with that string does not only return all the nodes
with the given id's -- it also does the duplicate node elimination ...


I figured out how to create the DOCTYPE definition while creating output
by xsl:text. Generating such an output XML file works perfect as can be
seen in the demo idc.xsl [1] and below.
File idc2.xml is the output generated by calling template idcopy for file
simple2.xml.

The big question now is, whether exslt:node-set() supports DOCTYPE
definitions and how.  idc.xsl shows an attempt which does not work.
Accessing an element by its id works for document('idc2.xml') but
does not work for document(exslt:node-set($rtf)) although both are
generated identically by a call to template idcopy.
The difference seem to be the parsing from file idc2.xml ...


Is DOCTYPE supported by exslt:node-set()?
Is the generation of DOCTYPE by <xsl:text> OK for this purpose?
Can using id() function be made working for duplicate elimination
somehow differently?


$ xsltproc idc.xsl simple2.xml

----------
<node id="id2335172" type="text" value="4"/>
$ cat simple2.xml
<a>
  <b>
    <c>1</c>
    <c>2</c>
  </b>
  <b>
    <c>3</c>
    <c>4</c>
  </b>
</a>

$ cat idc2.xml

<!DOCTYPE node [ <!ATTLIST node id ID #REQUIRED> ]>
<node id="id2335401" type="element" name="a"><node id="id2335402"
type="text" value="&#10;  "/><node id="id2335404" type="element"
name="b"><node id="id2335405" type="text" value="&#10;    "/><node
id="id2335406" type="element" name="c"><node id="id2335407" type="text"
value="1"/></node><node id="id2335408" type="text" value="&#10;    "/><node
id="id2335409" type="element" name="c"><node id="id2335162" type="text"
value="2"/></node><node id="id2335163" type="text" value="&#10;  "/>
</node><node id="id2335164" type="text" value="&#10;  "/><node
id="id2335165" type="element" name="b"><node id="id2335166" type="text"
value="&#10;    "/><node id="id2335167" type="element" name="c"><node
id="id2335168" type="text" value="3"/></node><node id="id2335169"
type="text" value="&#10;    "/><node id="id2335170" type="element"
name="c"><node id="id2335172" type="text" value="4"/></node><node
id="id2335173" type="text" value="&#10;  "/></node><node id="id2335174"
type="text" value="&#10;"/></node>
$
$ cat idc.xsl
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:exslt="http://exslt.org/common"
  exclude-result-prefixes="exslt"
>
  <xsl:output omit-xml-declaration="yes"/>

  <xsl:key name="nodes-by-id" match="node()" use="@id"/>

  <xsl:template match="/">
    <xsl:variable name="rtf">
      <xsl:call-template name="idcopy"/>
    </xsl:variable>

    <xsl:variable name="id1" select=
      "string(exslt:node-set($rtf)//node[@type='text'][@value='4']/@id)"/>

    <xsl:for-each select="document(exslt:node-set($rtf))">
      <xsl:copy-of select="id($id1)"/>
    </xsl:for-each>

<xsl:text>&#10;----------&#10;</xsl:text>

    <xsl:variable name="id2" select=
      "string(document('idc2.xml')//node[@type='text'][@value='4']/@id)"/>

    <xsl:for-each select="document('idc2.xml')">
      <xsl:copy-of select="id($id2)"/>
    </xsl:for-each>
  </xsl:template>



  <xsl:template name="idcopy">
    <xsl:text disable-output-escaping="yes">
      <![CDATA[<!DOCTYPE node [ <!ATTLIST node id ID #REQUIRED> ]>]]>
    </xsl:text>

    <xsl:choose>
      <xsl:when test="count(. | ../namespace::*) !=
                      count(../namespace::*)">
        <xsl:apply-templates select="." mode="idcopy"/>
      </xsl:when>

      <xsl:otherwise>
        <node id="{generate-id()}" type="namespace"
              name="{name()}" value="{.}"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template match="@*" mode="idcopy">
    <node id="{generate-id()}" type="attribute"
          name="{name()}" value="{.}"/>
  </xsl:template>

  <xsl:template match="node()" mode="idcopy">

    <node id="{generate-id()}" type="element" name="{name()}">

      <xsl:apply-templates select="@*" mode="idcopy"/>

      <xsl:for-each select="namespace::*">
        <xsl:if test="not(.=../../namespace::*) and name()!='xml'">
          <node id="{generate-id()}" type="namespace"
                name="{name()}" value="{.}"/>
        </xsl:if>
      </xsl:for-each>

      <xsl:apply-templates mode="idcopy"
        select="*|text()|comment()|processing-instruction()"/>
    </node>
  </xsl:template>

  <xsl:template match="comment()" mode="idcopy">
    <node id="{generate-id()}" type="comment" value="{.}"/>
  </xsl:template>

  <xsl:template match="processing-instruction()" mode="idcopy">
    <node id="{generate-id()}" type="processing-instruction"
          value="{.}"/>
  </xsl:template>

  <xsl:template match="text()" mode="idcopy">
    <node id="{generate-id()}" type="text" value="{.}"/>
  </xsl:template>

</xsl:stylesheet>
$


[1] http://stamm-wilbrandt.de/en/xsl-list/idc.xsl


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:       Michael Kay <mike@xxxxxxxxxxxx>
To:         xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Date:       08/24/2010 02:17 PM
Subject:    Re: Fw:  Question on duplicate node elimination



  I haven't understood your logic in any detail, but I wonder if it
suggests an alternative approach to the problem: namely, avoid creating
RTFs entirely, at least for intermediate results. Instead, whenever you
are evaluating an operation that returns a node-set, represent that
node-set as a string containing the generate-id values of the nodes in
the node-set, space-separated. Elimination of duplicates then reduces to
an operation on strings: not trivial, but not especially difficult either.

Michael Kay
Saxonica

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.