[XSL-LIST Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

RE: csv to xml converter bug

Subject: RE: csv to xml converter bug
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Tue, 10 Jul 2007 12:21:36 +0100
RE:  csv to xml converter bug
The construct

(?=X)

is allowed in some regex dialects, it means "match X with a zero-width
positive lookahead". But it's not allowed in the XPath regex dialect. This
is basically an assertion that X must match at the current position, without
causing X to be swallowed.

This construct (a zero-width negative lookahead) isn't allowed either:

(?!X) 

This is the inverse: it asserts that X does not match at the current
position, without swallowing X.

I'm afraid I have no idea whether these constructs can be translated into
anything that the XPath regex dialect permits.

Gunther Schadow can say "told you it would be needed":
http://www.stylusstudio.com/xsllist/200412/post00810.html


Michael Kay
http://www.saxonica.com/


> -----Original Message-----
> From: Andrew Welch [mailto:andrew.j.welch@xxxxxxxxx] 
> Sent: 10 July 2007 11:29
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  csv to xml converter bug
> 
> The csv-to-xml solution here:
> http://andrewjwelch.com/code/xslt/csv/csv-to-xml.html
> 
> ...has a bug where
> 
> ,,"foo,bar",,x,,
> 
> generates the tokens:
> 
> <token/>
> <token/>
> <token/>
> <token>"foo,bar"</token>
> <token/>
> <token/>
> <token>x</token>
> <token/>
> <token/>
> 
> The x should be at position 5 but is at position 7 because 
> the commas either side of the quoted values aren't being 
> included with the value itself, and are generation extra 
> tokens in the xsl:non-matching-substring block.
> 
> I've tried various ways to modify the solution to fix the 
> bug, but always ran into problems with other strings, such as:
> 
> "foo,bar",,"foo,bar",x,,,"foo,bar"
> 
> If you include leading or trailing commas with the quoted 
> values then the empty value at position 2 here gets consumed. 
>  Maybe a better regex would help here, but I couldn't write 
> one...  (Or perhaps if the non-matching-substring block had 
> access to some information about the matching-substring block...)
> 
> I had a dig around the net and found a regex[1] that could be 
> sufficient to just use with tokenize, but it causes the error:
> 
> FORX0002: Error at character 2 in regular expression
> ",(?=([^\"]*\"[^\"]*\")*(?![^\"...":
>   expected ())
> 
> It works in the "The Regex Coach", but not in XSLT (with 
> Saxon 8.9.0.3b)
> 
> The code is:
> 
> <xsl:variable name="regex"
> as="xs:string">,(?=([^\"]*\"[^\"]*\")*(?![^\"]*\"))</xsl:variable>
> 
> <xsl:function name="fn:getTokens" as="xs:string+">
> 	<xsl:param name="str" as="xs:string"/>
> 	<xsl:sequence select='for $t in tokenize($str, $regex)
> 		return replace($t, "^,""|"",$|("")""", "$1")'/> 
> </xsl:function>
> 
> It's an unusual looking regex (to my novice eye) - any 
> explanation as to whats going on would be great.
> 
> thanks
> andrew
> 
> [1] http://weblogs.asp.net/prieck/archive/2004/01/16/59457.aspx
> --
> http://andrewjwelch.com

Current Thread

PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.