|
[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message] XQuery - RegEx Pattern MatcherDavid A. Lee dlee at calldei.comFri Jun 19 09:44:31 PDT 2009
This may not be the best way to do it, but it works. and no regex.
This wont trap "quoted quotes" but it works on the example given and is
more readable then regexes to my caveman brain.
--------------------
declare variable $data :=
'foo;bar;"spam;bletch";another;"set;of;embedded";strings';
declare function local:nextstr( $s as xs:string ) as xs:string?
{
if( starts-with($s , '"' ) ) then
concat( '"' , substring-before( substring( $s , 2 ) , '"' ) ,'"' )
else
let $sep := substring-before( $s , ";" )
return
if( $sep eq "" ) then $s else $sep
};
declare function local:splitcsv( $s as xs:string ) as xs:string*
{
if( string-length($s) eq 0 ) then
()
else
let $first := local:nextstr( $s )
return
if( $first eq $s ) then
$s
else
( $first , local:splitcsv( substring( $s , string-length($first)
+ 2 ) ))
};
for $s in local:splitcsv( $data )
return <tag>{$s}</tag>
-----------------------
Returns
<tag>foo</tag>
<tag>bar</tag>
<tag>"spam;bletch"</tag>
<tag>another</tag>
<tag>"set;of;embedded"</tag>
<tag>strings</tag>
David A. Lee
http://x-query.com/mailman/listinfo/talk
http://www.calldei.com
http://www.xmlsh.org
812-482-5224
http://x-query.com/mailman/listinfo/talk wrote:
>> In XQuery think I would start by doing a replace() to replace
>> semicolons-not-within-quotes by some other delimiter (e.g. a PUA
>> character),
>> and then do a tokenize() to split the string on this new delimiter.
>>
>
> That is what I try to do ;)
>
> But, "...to replace semicolons-not-within-quotes..." needs Regex to find those, or not ?
>
>
>
>
>
>>> -----Original Message-----
>>> From: http://x-query.com/mailman/listinfo/talk [mailto:http://x-query.com/mailman/listinfo/talk]
>>> Sent: 19 June 2009 11:29
>>> To: Michael Kay; http://x-query.com/mailman/listinfo/talk
>>> Subject: Re: RE: XQuery - RegEx Pattern Matcher
>>>
>>> I am trying to "read" CSV data like this :
>>>
>>> one;"two;stilltwo";three;"four;stillfour";five
>>>
>>> this should resolve in something like this :
>>> ...
>>> <element>one</element>
>>> <element>two;stilltwo</element>
>>> <element>three</element>
>>> <element>four;stillfour</element>
>>> <element>five</element>
>>> ...
>>>
>>> if there is no separator(";") allowed within a text it is
>>> easy with just splitting a line with ";".
>>>
>>> But if there can be a ";" as a text, than I have to use RegEx.
>>> I succeded in finding a XQuery-RegEx if in one line there is
>>> only one case where a ";" is used as text.
>>>
>>> But I need to find every match, so I used the \\G . Worked
>>> fine, so I hoped to reuse it in XQuery...
>>>
>>>
>>>
>>> -------- Original-Nachricht --------
>>>
>>>> Datum: Fri, 19 Jun 2009 10:20:30 +0100
>>>> Von: "Michael Kay" <http://x-query.com/mailman/listinfo/talk>
>>>> An: http://x-query.com/mailman/listinfo/talk, http://x-query.com/mailman/listinfo/talk
>>>> Betreff: RE: XQuery - RegEx Pattern Matcher
>>>>
>>>> The XPath regular expression language does not recognize \G and it
>>>> does not recognize non-capturing groups.
>>>>
>>>> As far as matches() is concerned, there is no distinction between
>>>> capturing and non-capturing groups, so replace "(?:" by "(".
>>>>
>>>> I suspect you wanted your regex to contain "\G". In Java
>>>>
>>> you need to
>>>
>>>> escape this as "\\G"; in XPath/XQuery, backslash is not a special
>>>> character and does not need to be escaped. However, there's
>>>>
>>> no "\G" in
>>>
>>>> XPath regular expressions anyway. In Java it means "the end of the
>>>> previous match"; but XQuery is a functional language, so
>>>>
>>> "previous" is
>>>
>>>> meaningless. At this stage I give up because I'm not sure
>>>>
>>> what you are
>>>
>>>> trying to do: you haven't supplied enough of your code.
>>>>
>>>> Regards,
>>>>
>>>> Michael Kay
>>>> http://www.saxonica.com/
>>>> http://twitter.com/michaelhkay
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: http://x-query.com/mailman/listinfo/talk
>>>>> [mailto:http://x-query.com/mailman/listinfo/talk] On Behalf Of http://x-query.com/mailman/listinfo/talk
>>>>> Sent: 19 June 2009 09:33
>>>>> To: http://x-query.com/mailman/listinfo/talk
>>>>> Subject: XQuery - RegEx Pattern Matcher
>>>>>
>>>>> Hi,
>>>>> I am trying to use a RegEx within XQuery. In general that
>>>>>
>>> works fine.
>>>
>>>>> Now I have a more complex RegEx to work with CSV-files(these CSV
>>>>> have ";" as separator).
>>>>> I use can the following without problems in Java :
>>>>>
>>>>> Pattern Regex = Pattern.compile(
>>>>> "\\G(?:^|;)(?:\"((?:[^\"]|\"\")*)\"|([^\";]*))");
>>>>> ...
>>>>>
>>>>> But in XQuery
>>>>> let $regularExpr :='\\G(?:^|;)(?:\"((?:[^\"]|\"\")*)\"|([^\";]*))'
>>>>> ...
>>>>> if (matches($row,$regularExpr) ) then ( ...
>>>>>
>>>>> just gives the error :
>>>>>
>>>>> Error at character 4 in regular expression
>>>>> "\\G(?:^|;)(?:\"((?:[^\"]|\"\")...": expected ())
>>>>>
>>>>>
>>>>> I tried the optional flags (i, x, ...) but always with the same
>>>>> result...
>>>>> What is wrong with this RegEx ?
>>>>>
>>>>> P.S. :I run the XQuery from Java with Saxon.
>>>>>
>>>>>
>>>>> --
>>>>> GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate und
>>>>> Telefonanschluss für nur 17,95 Euro/mtl.!*
>>>>> http://portal.gmx.net/de/go/dsl02
>>>>> _______________________________________________
>>>>> http://x-query.com/mailman/listinfo/talk
>>>>> http://x-query.com/mailman/listinfo/talk
>>>>>
>>>> _______________________________________________
>>>> http://x-query.com/mailman/listinfo/talk
>>>> http://x-query.com/mailman/listinfo/talk
>>>>
>>> --
>>> GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate und
>>> Telefonanschluss für nur 17,95 Euro/mtl.!*
>>> http://portal.gmx.net/de/go/dsl02
>>>
>> _______________________________________________
>> http://x-query.com/mailman/listinfo/talk
>> http://x-query.com/mailman/listinfo/talk
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://x-query.com/pipermail/talk/attachments/20090619/9ee977af/attachment-0001.htm
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|






