[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: Breaking paragraphs one linebreaks

Subject: Re: Breaking paragraphs one linebreaks
From: "Eliot Kimber ekimber@xxxxxxxxxxxx" <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 9 May 2019 14:25:21 -0000
Re:  Breaking paragraphs one linebreaks
The DITA Community org.dita-community.i18n project provides general Saxon
extensions for doing locale-aware word and line breaking. It requires either
Saxon PE/EE or custom Java code to register the extension functions for use
with HE (you can do with DITA Open Toolkit automatically starting with version
3.3.1).

https://github.com/dita-community/org.dita-community.i18n

Cheers,

Eliot
--
Eliot Kimber
http://contrext.com


o;?On 5/9/19, 9:01 AM, "Imsieke, Gerrit, le-tex gerrit.imsieke@xxxxxxxxx"
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:

    Hi Manuel,

    You can use XSLT. It will be easier if

    a) you can use at least XSLT 2.0 and

    b) the text nodes with the escaped breaks are immediately below the
    <seg> elements, without any other highlighting etc. elements around them.

    Are these two conditions satisfied?

    Gerrit

    On 09.05.2019 15:44, Manuel Souto Pico terminolator@xxxxxxxxx wrote:
    > Dear all,
    >
    > I have a bilingual TMX file containing many tu elements like this,
    > containing full paragraphs:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <tmx version="1.4">
    >     <header segtype="paragraph" adminlang="en"/>
    >     <body>
    >        <tu tuid="1">
    >           <tuv xml:lang="es">
    >              <seg>El PSOE ganarC-a en 10 de las 12 comunidades donde
    > habrC! elecciones autonC3micas el 26 de mayo, segC:n el C:ltimo
barC3metro
    > del CIS. &lt;br&gt;Las excepciones serC-an Cantabria, donde el PRC, el
    > partido de Miguel Cngel Revilla, serC-a primera fuerza.
    > &lt;br&gt;&lt;br&gt;Navarra Suma, la coaliciC3n de PP, Ciudadanos y UPN,
    > serC-a primera fuerza en la comunidad foral.</seg>
    >           </tuv>
    >           <tuv xml:lang="uz">
    >              <seg>PSOE, MDHning eng so'nggi barometri bo'yicha 26 mayda
    > bo'lib o'tadigan mintaqaviy saylovlarda 12 ta jamoaning 10tasida g'olib
    > chiqadi.&lt;br&gt;Istisnolarga ko'ra, Cantabria, XXR, Migel Anxel
    > Revilla partiyasi birinchi kuch bo'ladi.&lt;br&gt;&lt;br&gt;"Navarra
    > Suma", PP, Cuudadanos va UPN koalitsiyasi mintaqaviy hamjamiyatning
    > birinchi kuchi bo'ladi.</seg>
    >           </tuv>
    >        </tu>
    >     </body>
    > </tmx>
    >
    > As you can see there are a few (escaped) line break tags between
sentences.
    >
    > I would like to transform that into something like this, where every tu
    > element contains only sentences:
    >
    > <?xml version="1.0" encoding="UTF-8"?>
    > <tmx version="1.4">
    >     <header segtype="paragraph" adminlang="en"/>
    >     <body>
    >        <tu tuid="1">
    >           <tuv xml:lang="es">
    > <seg>El PSOE ganarC-a en 10 de las 12 comunidades donde habrC!
elecciones
    > autonC3micas el 26 de mayo, segC:n el C:ltimo barC3metro del CIS.</seg>
    >           </tuv>
    >           <tuv xml:lang="uz">
    > <seg>PSOE, MDHning eng so'nggi barometri bo'yicha 26 mayda bo'lib
    > o'tadigan mintaqaviy saylovlarda 12 ta jamoaning 10tasida g'olib
    > chiqadi.</seg>
    >           </tuv>
    >        </tu>
    >        <tu tuid="2">
    >           <tuv xml:lang="es">
    > <seg>Las excepciones serC-an Cantabria, donde el PRC, el partido de
    > Miguel Cngel Revilla, serC-a primera fuerza. </seg>
    >           </tuv>
    >           <tuv xml:lang="uz">
    > <seg>Istisnolarga ko'ra, Cantabria, XXR, Migel Anxel Revilla partiyasi
    > birinchi kuch bo'ladi.</seg>
    >           </tuv>
    >        </tu>
    >        <tu tuid="3">
    >           <tuv xml:lang="es">
    > <seg>Navarra Suma, la coaliciC3n de PP, Ciudadanos y UPN, serC-a primera
    > fuerza en la comunidad foral.</seg>
    >           </tuv>
    >           <tuv xml:lang="uz">
    > <seg>"Navarra Suma", PP, Cuudadanos va UPN koalitsiyasi mintaqaviy
    > hamjamiyatning birinchi kuchi bo'ladi.</seg>
    >           </tuv>
    >        </tu>
    >     </body>
    > </tmx>
    >
    > Do you think I can use XSLT to do this more or less easily?
    >
    > I wrote a few XSLT stylesheets years ago but I'm far from being a savvy
    > user.
    >
    > Thanks in advance for any tips.
    >
    > Cheers, Manuel
    > XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
    > EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/225679>
    > (by email <>)

    --
    Gerrit Imsieke
    GeschC$ftsfC<hrer / Managing Director
    le-tex publishing services GmbH
    Weissenfelser Str. 84, 04229 Leipzig, Germany
    Phone +49 341 355356 110, Fax +49 341 355356 510
    gerrit.imsieke@xxxxxxxxx, http://www.le-tex.de

    Registergericht / Commercial Register: Amtsgericht Leipzig
    Registernummer / Registration Number: HRB 24930

    GeschC$ftsfC<hrer / Managing Directors:
    Gerrit Imsieke, Svea Jelonek, Thomas Schmidt

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.