[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Tokenizing and transforming a CSV file

Subject: Tokenizing and transforming a CSV file
From: Mukul Gandhi <gandhi.mukul@xxxxxxxxx>
Date: Wed, 25 Feb 2009 22:14:26 +0530
 Tokenizing and transforming a CSV file
Hi all,
  I have a CSV file (named, test.csv) as following (as an example, two
lines/records are shown below):

hi,"this is a long string, please tokenize me",hello,world
hello,please tokenize me,hi there

I want this to be transformed to following XML:

<result>
   <record>
      <field>hi</field>
      <field>this is a long string, please tokenize me</field>
      <field>hello</field>
      <field>world</field>
   </record>
   <record>
      <field>hello</field>
      <field>please tokenize me</field>
      <field>hi there</field>
   </record>
</result>

i.e, each line/record should be tokenized by a comma, with a
restriction that a comma inside a double quoted string should not be
considered as a delimiter:

Below is my attempt upto now.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                       version="2.0">

   <xsl:output method="xml" indent="yes" />

   <xsl:variable name="filedata" select="unparsed-text('test.csv')" />

   <xsl:template match="/">
      <result>
        <xsl:for-each select="tokenize($filedata, '\r?\n')">
          <record>
            <xsl:for-each select="tokenize(., ',')">
              <field>
	        <xsl:value-of select="." />
	      </field>
	    </xsl:for-each>
	  </record>
	</xsl:for-each>
      </result>
   </xsl:template>

</xsl:stylesheet>

The above stylesheet produces following output:

<result>
   <record>
      <field>hi</field>
      <field>"this is a long string</field>
      <field> please tokenize me"</field>
      <field>hello</field>
      <field>world</field>
   </record>
   <record>
      <field>hello</field>
      <field>please tokenize me</field>
      <field>hi there</field>
   </record>
</result>

As per my requirement, following output fragment

<field>"this is a long string</field>
<field> please tokenize me"</field>

is wrong.

This should actually appear as:

<field>this is a long string, please tokenize me</field>

I would appreciate any help regarding this problem.

I am using XSLT 2.0 with Saxon 9.x.


-- 
Regards,
Mukul Gandhi

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2011 All Rights Reserved.