[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Tokenizing and transforming a CSV file

Subject: RE: Tokenizing and transforming a CSV file
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Wed, 25 Feb 2009 16:53:28 -0000
RE:  Tokenizing and transforming a CSV file
I would use xsl:analyze-string rather than tokenize(), with a regex such as

(,"[^"]*")|(,[^,]*)

Michael Kay 
http://www.saxonica.com/

> -----Original Message-----
> From: Mukul Gandhi [mailto:gandhi.mukul@xxxxxxxxx] 
> Sent: 25 February 2009 16:44
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  Tokenizing and transforming a CSV file
> 
> Hi all,
>   I have a CSV file (named, test.csv) as following (as an 
> example, two lines/records are shown below):
> 
> hi,"this is a long string, please tokenize me",hello,world 
> hello,please tokenize me,hi there
> 
> I want this to be transformed to following XML:
> 
> <result>
>    <record>
>       <field>hi</field>
>       <field>this is a long string, please tokenize me</field>
>       <field>hello</field>
>       <field>world</field>
>    </record>
>    <record>
>       <field>hello</field>
>       <field>please tokenize me</field>
>       <field>hi there</field>
>    </record>
> </result>
> 
> i.e, each line/record should be tokenized by a comma, with a 
> restriction that a comma inside a double quoted string should 
> not be considered as a delimiter:
> 
> Below is my attempt upto now.
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>                        version="2.0">
> 
>    <xsl:output method="xml" indent="yes" />
> 
>    <xsl:variable name="filedata" select="unparsed-text('test.csv')" />
> 
>    <xsl:template match="/">
>       <result>
>         <xsl:for-each select="tokenize($filedata, '\r?\n')">
>           <record>
>             <xsl:for-each select="tokenize(., ',')">
>               <field>
> 	        <xsl:value-of select="." />
> 	      </field>
> 	    </xsl:for-each>
> 	  </record>
> 	</xsl:for-each>
>       </result>
>    </xsl:template>
> 
> </xsl:stylesheet>
> 
> The above stylesheet produces following output:
> 
> <result>
>    <record>
>       <field>hi</field>
>       <field>"this is a long string</field>
>       <field> please tokenize me"</field>
>       <field>hello</field>
>       <field>world</field>
>    </record>
>    <record>
>       <field>hello</field>
>       <field>please tokenize me</field>
>       <field>hi there</field>
>    </record>
> </result>
> 
> As per my requirement, following output fragment
> 
> <field>"this is a long string</field>
> <field> please tokenize me"</field>
> 
> is wrong.
> 
> This should actually appear as:
> 
> <field>this is a long string, please tokenize me</field>
> 
> I would appreciate any help regarding this problem.
> 
> I am using XSLT 2.0 with Saxon 9.x.
> 
> 
> --
> Regards,
> Mukul Gandhi

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.