[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message]

XQuery - RegEx Pattern Matcher

David A. Lee dlee at calldei.com
Fri Jun 19 09:44:31 PDT 2009


  XQuery - RegEx Pattern Matcher
This may not be the best way to do it, but it works. and no regex.
This wont trap "quoted quotes"  but it works on the example given and is 
more readable then regexes to my caveman brain.


--------------------
declare variable $data := 
'foo;bar;"spam;bletch";another;"set;of;embedded";strings';

declare function local:nextstr( $s as xs:string ) as xs:string?
{
    if( starts-with($s , '"' ) ) then
        concat( '"' , substring-before( substring( $s , 2 ) , '"' ) ,'"' )
    else
    let $sep := substring-before( $s , ";" )
    return
    if( $sep eq "" ) then $s else $sep
};

declare function local:splitcsv( $s as xs:string ) as xs:string*
{
    if( string-length($s) eq 0 ) then
        ()
    else
    let $first := local:nextstr( $s )
    return
    if( $first eq $s ) then
        $s
    else
        ( $first , local:splitcsv( substring( $s , string-length($first) 
+ 2 ) ))


};

for $s in local:splitcsv( $data )
return <tag>{$s}</tag>
-----------------------
Returns

<tag>foo</tag>
<tag>bar</tag>
<tag>"spam;bletch"</tag>
<tag>another</tag>
<tag>"set;of;embedded"</tag>
<tag>strings</tag>




David A. Lee
http://x-query.com/mailman/listinfo/talk  
http://www.calldei.com
http://www.xmlsh.org
812-482-5224



http://x-query.com/mailman/listinfo/talk wrote:
>> In XQuery think I would start by doing a replace() to replace
>> semicolons-not-within-quotes by some other delimiter (e.g. a PUA
>> character),
>> and then do a tokenize() to split the string on this new delimiter.
>>     
>
> That is what I try to do ;)
>
> But, "...to replace semicolons-not-within-quotes..." needs Regex to find those, or not ?
>
>
>
>
>   
>>> -----Original Message-----
>>> From: http://x-query.com/mailman/listinfo/talk [mailto:http://x-query.com/mailman/listinfo/talk] 
>>> Sent: 19 June 2009 11:29
>>> To: Michael Kay; http://x-query.com/mailman/listinfo/talk
>>> Subject: Re: RE:  XQuery - RegEx Pattern Matcher
>>>
>>> I am trying to "read" CSV data like this :
>>>
>>> one;"two;stilltwo";three;"four;stillfour";five
>>>
>>> this should resolve in something like this :
>>> ...
>>> <element>one</element>
>>> <element>two;stilltwo</element>
>>> <element>three</element>
>>> <element>four;stillfour</element>
>>> <element>five</element>
>>> ...
>>>
>>> if there is no separator(";") allowed within a text it is 
>>> easy with just splitting a line with ";".
>>>
>>> But if there can be a ";" as a text, than I have to use RegEx.
>>> I succeded in finding a XQuery-RegEx if in one line there is 
>>> only one case where a ";" is used as text.
>>>
>>> But I need to find every match, so I used the \\G . Worked 
>>> fine, so I hoped to reuse it in XQuery...
>>>
>>>
>>>
>>> -------- Original-Nachricht --------
>>>       
>>>> Datum: Fri, 19 Jun 2009 10:20:30 +0100
>>>> Von: "Michael Kay" <http://x-query.com/mailman/listinfo/talk>
>>>> An: http://x-query.com/mailman/listinfo/talk, http://x-query.com/mailman/listinfo/talk
>>>> Betreff: RE:  XQuery - RegEx Pattern Matcher
>>>>         
>>>> The XPath regular expression language does not recognize \G and it 
>>>> does not recognize non-capturing groups.
>>>>
>>>> As far as matches() is concerned, there is no distinction between 
>>>> capturing and non-capturing groups, so replace "(?:" by "(".
>>>>
>>>> I suspect you wanted your regex to contain "\G". In Java 
>>>>         
>>> you need to 
>>>       
>>>> escape this as "\\G"; in XPath/XQuery, backslash is not a special 
>>>> character and does not need to be escaped. However, there's 
>>>>         
>>> no "\G" in 
>>>       
>>>> XPath regular expressions anyway. In Java it means "the end of the 
>>>> previous match"; but XQuery is a functional language, so 
>>>>         
>>> "previous" is 
>>>       
>>>> meaningless. At this stage I give up because I'm not sure 
>>>>         
>>> what you are 
>>>       
>>>> trying to do: you haven't supplied enough of your code.
>>>>
>>>> Regards,
>>>>
>>>> Michael Kay
>>>> http://www.saxonica.com/
>>>> http://twitter.com/michaelhkay
>>>>
>>>>         
>>>>> -----Original Message-----
>>>>> From: http://x-query.com/mailman/listinfo/talk
>>>>> [mailto:http://x-query.com/mailman/listinfo/talk] On Behalf Of http://x-query.com/mailman/listinfo/talk
>>>>> Sent: 19 June 2009 09:33
>>>>> To: http://x-query.com/mailman/listinfo/talk
>>>>> Subject:  XQuery - RegEx Pattern Matcher
>>>>>
>>>>> Hi,
>>>>> I am trying to use a RegEx within XQuery. In general that 
>>>>>           
>>> works fine.
>>>       
>>>>> Now I have a more complex RegEx to work with CSV-files(these CSV 
>>>>> have ";" as separator).
>>>>> I use can the following without problems in Java :
>>>>>
>>>>> Pattern Regex = Pattern.compile(
>>>>> "\\G(?:^|;)(?:\"((?:[^\"]|\"\")*)\"|([^\";]*))");
>>>>> ...
>>>>>
>>>>> But in XQuery
>>>>> let $regularExpr :='\\G(?:^|;)(?:\"((?:[^\"]|\"\")*)\"|([^\";]*))'
>>>>> ...
>>>>> if (matches($row,$regularExpr) ) then ( ...
>>>>>
>>>>> just gives the error :
>>>>>
>>>>> Error at character 4 in regular expression
>>>>> "\\G(?:^|;)(?:\"((?:[^\"]|\"\")...": expected ())
>>>>>
>>>>>
>>>>> I tried the optional flags (i, x, ...) but always with the same 
>>>>> result...
>>>>> What is wrong with this RegEx ?
>>>>>
>>>>> P.S. :I run the XQuery from Java with Saxon.
>>>>>
>>>>>
>>>>> --
>>>>> GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate und 
>>>>> Telefonanschluss für nur 17,95 Euro/mtl.!*
>>>>> http://portal.gmx.net/de/go/dsl02
>>>>> _______________________________________________
>>>>> http://x-query.com/mailman/listinfo/talk
>>>>> http://x-query.com/mailman/listinfo/talk
>>>>>           
>>>> _______________________________________________
>>>> http://x-query.com/mailman/listinfo/talk
>>>> http://x-query.com/mailman/listinfo/talk
>>>>         
>>> --
>>> GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate und 
>>> Telefonanschluss für nur 17,95 Euro/mtl.!* 
>>> http://portal.gmx.net/de/go/dsl02
>>>       
>> _______________________________________________
>> http://x-query.com/mailman/listinfo/talk
>> http://x-query.com/mailman/listinfo/talk
>>     
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://x-query.com/pipermail/talk/attachments/20090619/9ee977af/attachment-0001.htm


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2007 All Rights Reserved.