[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: Converting CSV to XML without hardcoding schema d

Subject: RE: Converting CSV to XML without hardcoding schema details in xsl
From: "Pantvaidya, Vishwajit" <vpantvai@xxxxxxxxxxxxx>
Date: Thu, 22 Jun 2006 20:50:34 -0700
csv regex
Thanks a lot for the xsl, Michael.

My CSV has some commas in some cells - in those cases the entire cell value
is itself enclosed in quotes. So a simple tokenize that splits at comma
boundaries would not work - so I replaced the tokenize for the cells with a
regex that took care of the quotes (is there any alternative here other than
using regex?). I had to specify the quotes in the regex as &quot;
After this, it started taking 45 minutes to transform a 20 columns-35 rows
CSV.

Next problem I found was that for columns that contain commas in the value,
all cells in that column are not enclosed in quotes - only those cells that
actually have commas are enclosed in quotes. So I changed the regex to
account for 0/more quotes. Now it transformed in 45 secs - surprise?
But even now, I see that the 0/more quotes regex throws it off and the csv
gets incorrectly parsed resulting in the wrong xml content.

So I made some changes and the current xsl has the regex as:
<xsl:analyze-string select="."
regex="(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),(.*),&quo
t;*(.*)&quot;*,(.*),&quot;*(.*)&quot;*,(.*),(.*),&quot;*($.*)&quot;*,(.*)">

(now it is taking even more time - 1hour+ and still not done. Lets see if
atleast the xml comes out correctly.)

Any suggestions to mitigate these regex complexity due to non-uniformity of
input CSV?

Or am I am better off asking the CSV provider of the CSV to keep the CSV
uniform so that either all cells in the column are with/without quotes?


Thanks,

Vish.

>-----Original Message-----
>From: Michael Kay [mailto:mike@xxxxxxxxxxxx]
>Sent: Thursday, June 22, 2006 12:43 AM
>To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
>Subject: RE:  Converting CSV to XML without hardcoding schema details
>in xsl
>
>> Can anybody suggest how to convert CSV data in the format
>>
>> Field1,Field2
>> Value11,Value12
>>
>> to xml like
>>
>> <Field1>Value11</Field1>
>> <Field2>Value12</Field2>
>>
>> without hardcoding the fieldnames in the xsl?
>
><xsl:variable name="lines" as="xs:string*"
>              select="tokenize(unparsed-text($input-file, '\r?\n'"))"/>
><xsl:variable name="field-names as="xs:string*"
>              select="tokenize($lines[1], ',')"/>
><xsl:for-each select="subsequence($lines,2)">
><row>
>  <xsl:variable name="cells" select="tokenize(., ',')"/>
>  <xsl:for-each select="$cells">
>    <xsl:variable name="p" as="xs:integer" select="position()"/>
>    <xsl:element name="$fields[$p]"/>
>      <xsl:value-of select="."/>
>    </
>  </
></
></
>
>Michael Kay
>http://www.saxonica.com/
>
>
>>
>> I was thinking of something like
>>
>> <xsl:for-each select="tokenize(., ',')"> &lt;<xsl:value-of
>> select="item-at($elementNames,index-of(?parent of current
>> node?,.))"/>&gt; <xsl:value-of select="."/>
>> &lt;/<xsl:value-of
>> select="item-at($elementNames,index-of(?parent of current
>> node?,.))"/>&gt; </xsl:for-each>
>>
>> where elementNames is a tokenized list of the fieldnames -
>> but I am unable to get it to work.
>>
>>
>>
>> >-----Original Message-----
>> >From: Pantvaidya, Vishwajit
>> >Sent: Wednesday, June 21, 2006 12:17 AM
>> >To: 'xsl-list@xxxxxxxxxxxxxxxxxxxxxx'
>> >Subject:  Converting CSV to XML without hardcoding
>> schema details
>> >in xsl
>> >
>> >Hello,
>> >
>> >I am trying to convert a CSV datafile into XMl format.
>> >The headers for the CSV data are in a file header.csv e.g.
>> >Field1,Field2 The data is in a file Data.csv e.g.
>> >Value11,Value12
>> >Value21,Value22
>> >
>> >I need to convert the CSV data into xml output by creating
>> xml elements
>> >using the names in the csv header and taking the
>> corresponding values
>> >from the data file, so that I get an xml as follows:
>> >
>> ><doc>
>> ><line>
>> ><Field1>Value11</Field1>
>> ><Field2>Value12</Field2>
>> ></line>
>> ><line>
>> ><Field1>Value21</Field1>
>> ><Field2>Value22</Field2>
>> ></line>
>> ></doc>
>> >
>> >I was trying to see if I can do this without hardcoding the header
>> >names in the xsl. I reached upto the point where my xsl
>> looks as below:
>> >
>> ><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>> >xmlns:op="http://www.w3.org/2001/12/xquery-operators"
>> >    xmlns:xf="http://www.w3.org/2001/12/xquery-functions"
>> >version="2.0">
>> >
>> >    <xsl:output  name="xmlFormat" method="xml" indent="yes"
>> >omit-xml-declaration="yes"/>
>> >
>> >    <xsl:variable name="source1" select="'data.csv'"/>
>> >    <xsl:variable name="elementNamesList" select="'Header.csv'"/>
>> >    <xsl:variable name="encoding" select="'iso-8859-1'"/>
>> >
>> >    <xsl:variable name="elementNames"
>> >select="tokenize(unparsed-text($elementNamesList,$encoding),',')"/>
>> >    <xsl:variable name="src">
>> >        <doc>
>> >            <xsl:for-each
>> >select="tokenize(unparsed-text($source1,$encoding), '\r?\n')">
>> >                <line>
>> >                    <xsl:for-each select="tokenize(., ',')">
>> >                        &lt;<xsl:value-of
>> >select="op:item-at($elementNames,index-of(?parent of current
>> >node?,.))"/>&gt;
>> >                            <xsl:value-of select="."/>
>> >                            &lt;/<xsl:value-of
>> >select="item-at($elementNames,3)"/>&gt;
>> >                    </xsl:for-each>
>> >                </line>
>> >            </xsl:for-each>
>> >        </doc>
>> >    </xsl:variable>
>> >
>> >    <xsl:template match="/">
>> >        <xsl:result-document format = "xmlFormat" href = "src1.xml">
>> >            <xsl:copy-of select="$src"/>
>> >        </xsl:result-document>
>> >    </xsl:template>
>> >
>> ></xsl:stylesheet>
>> >
>> >In the yet-incomplete statement <xsl:value-of
>> >select="op:item-at($elementNames,index-of(?parent of current
>> >node?,.))"/>, I am trying to generate an xml element with
>> the Nth field
>> >name from the headers name list for the Nth field value. Couple of
>> >issues/questions here:
>> >
>> >- I am getting the error "Cannot find a matching 2-argument function
>> >named {http://www.w3.org/2001/12/xquery-operators}item-at()"
>> when I try
>> >to validate the xsl. What could be the reason?
>> >
>> >- How can I get the ?parent of current node? Needed to compute the
>> >index of the current data in the data record?
>> >
>> >- Is there any other better way to do it? Any way that I can do the
>> >same using xsl:element?
>> >
>> >In general, is this the only/best way or is there any other
>> better way
>> >to achieve the same goal?
>> >
>> >
>> >Thanks and Regards,
>> >
>> >Vish.

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.