[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Using xsl:analyze-string and regex to parse long lines

Subject: Using xsl:analyze-string and regex to parse long lines with white-space
From: Rob Newman <rlnewman@xxxxxxxx>
Date: Tue, 19 Jun 2007 11:38:54 -0700
 Using xsl:analyze-string and regex to parse long lines
Hi All,

I have an input file "input.xml":

input.xml
-------------
<pfarr>
<pfstring name="dlsite">
q330 0000 345 1169760599.99999 TA_D03A 921 47 -123 0.0325 regular internet hosted 1172293472.07035
q330 0123 234 9999999999.99900 TA_HAST 1005 36 -121 0.5558 regular internet hosted 1172293966.53652
q330 0234 123 1157317200.00000 TA_U04C 718 36 -120 0.7886 vsat spacenet 1172298386.07728
</pfstring>
</pfarr>


I am trying to parse the contents of <pfstring> to get the 5th column ("TA_D03A" in the example), the 10th ("regular internet") and the 11th ("hosted") for each line and push it to "output.xml" thus:

output.xml
---------------
<dlsites>
	<site name="TA_D03A">
		<comt>regular internet</comt>
		<comp>hosted</comp>
	</site>
	<site name="TA_HAST">
		<comt>regular internet</comt>
		<comp>hosted</comp>
	</site>
	<site name="TA_U04C">
		<comt>vsat</comt>
		<comp>spacenet</comp>
	</site>
</dlsites>

Each entry in input.xml/pfarr/pfstring is on a new line. I am trying to use the regex functions and have the following, but it does not seem to be working:

transform.xsl
-----------------
<?xml version="1.0" encoding="ISO-8859-1"?>

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/ Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />


<xsl:template match="/">
    <dlsites>
        <xsl:apply-templates select="/pfarr/pfstring" />
    </dlsites>
</xsl:template>

<xsl:template match="pfstring[@name = 'dlsite']">
    <xsl:variable name="elValue" select="." />

<xsl:analyze-string select="$elValue" regex="\s*(.*)\s+(.*)\s+ (.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+\n">

        <xsl:matching-substring>
            <xsl:variable name="dlname" select="regex-group(5)" />
            <site name="{@dlname}">
                <comt><xsl:value-of select="regex-group(10)"/></comt>
                <comp><xsl:value-of select="regex-group(11)"/></comp>
            </site>
        </xsl:matching-substring>

        <xsl:non-matching-substring>
            <unknown>
                <xsl:value-of select="$elValue"/>
            </unknown>
        </xsl:non-matching-substring>

</xsl:analyze-string>

</xsl:template>

</xsl:stylesheet>

Is this the most efficient way of processing this type of file? It is highly likely that I have something wrong in the regex section - any pointers would be appreciated. The XSLT processor I am using is Saxon 8.9J.

Thanks in advance!
- Rob Newman

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.