[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Find inconsistencies: Perl or XSLT?

Subject: Re: Find inconsistencies: Perl or XSLT?
From: Hermann Stamm-Wilbrandt <STAMMW@xxxxxxxxxx>
Date: Wed, 1 Dec 2010 19:01:22 +0100
Re:  Find inconsistencies: Perl or XSLT?
Perhaps I am missing something here, but for this simple problem XSLT 1.0
end even XPATH 1.0 seems to be good enough.


Problem:
identify duplicate source entries of unit elements


Input tags did not match, find corrected input.xml below.


If input file size is moderate this simple XPATH statement will do it:

$ xpath++ "/data/unit[source=following-sibling::unit/source]" input.xml

===============================================================================
<unit id="1">
    <source>blabla</source>
    <target>plapla</target>
</unit>
===============================================================================
<unit id="2">
    <source>bleble</source>
    <target>pleple</target>
</unit>
$


Now in case of bigger files to process making use of key() function helps:

$ cat dupsrc.xsl
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
  <xsl:key name="source" match="node()" use="source"/>

  <xsl:template match="text()"/>

  <xsl:template match="/data/unit[count(key('source',source))>1]">
    <xsl:value-of select="concat(@id,'-',source,'&#10;')"/>
  </xsl:template>

</xsl:stylesheet>
$
$ xsltproc dupsrc.xsl input.xml
<?xml version="1.0"?>
1-blabla
2-bleble
4-blabla
5-bleble

$ cat input.xml
<data>
<unit id="1">
    <source>blabla</source>
    <target>plapla</target>
</unit>
<unit id="2">
    <source>bleble</source>
    <target>pleple</target>
</unit>
<unit id="3">
    <source>bloblo</source>
    <target>ploplo</target>
</unit>
<unit id="4">
    <source>blabla</source>
    <target>plapla</target>
</unit>
<unit id="5">
    <source>bleble</source>
    <target>lolailo</target>
</unit>
</data>
$


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
Fixpack team lead
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:       Michael Kay <mike@xxxxxxxxxxxx>
To:         xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Date:       12/01/2010 04:06 PM
Subject:    Re:  Find inconsistencies: Perl or XSLT?



On 01/12/2010 14:46, Manuel Souto Pico wrote:
> Dear all,
>
> I need to process some files and I know how to do it in Perl, but as
> has happened to be the case in the past with other stuff, perhaps
> there's a (objectively) simpler or more efficient way to do it with
> XSLT.
>
> I have a file like this
>
> <unit id="1">
>     <source>blabla</source>
>     <target>plapla</source>
> </unit>
> <unit id="2">
>     <source>bleble</source>
>     <target>pleple</source>
> </unit>
> <unit id="3">
>     <source>bloblo</source>
>     <target>ploplo</source>
> </unit>
> <unit id="4">
>     <source>blabla</source>
>     <target>plapla</source>
> </unit>
> <unit id="5">
>     <source>bleble</source>
>     <target>lolailo</source>
> </unit>
>
> I think the example is illustrative enough.
>
> The target element contains the translation of the source element, and
> one same element must always be translated in the same way, but
> sometimes it's not. So what I'd to do is find two or more units with
> the same source but with different target (like 2 and 5 in the
> example, but unlike 1 and 4).
>
> In Perl I would use a XML module (or not) and put the source elements
> in the keys of a hash and the target elements in their corresponding
> values. When assigning a new key-value pair, if the key already
> exists, I compare the values. If they are equal, they pass, else they
> are flagged and included in the report.
>
> The report in this case would be something like:
>
> The following inconsitencies have been found
> 2: bleble ->  pleple
> 5: bleble ->  lolailo
>
> Is it possible to do this in XSLT? Is it more efficient that doing it
> in Perl as I was planning to? I knowledge of XSLT is very limited and
> I can't see beyond transforming a XML file into another XML file.
>
> Thanks a lot for your opinion.
> Manuel
>
>
Something like this:

<xsl:for-each-group select="unit" group-by="source">
<xsl:if test="count(distinct-values(current-group()/target)) gt 1">
<conflicts-for source="{current-grouping-key()}">
<xsl:value-of select="distinct-values(current-group()/target)"/>
</conflicts>
</xsl:if>
</xsl:for-each-group>

Michael Kay
Saxonica

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.