|
[XQuery Talk Mailing List Archive Home] [By Date] [By Thread] [By Subject] [By Author] [Recent Entries] [Reply To This Message] XQuery - RegEx Pattern Matcherderstubbi at gmx.de derstubbi at gmx.deFri Jun 19 22:08:51 PDT 2009
Hi, thanks for your answer, and even more thanks for your solution ;) I have to admit that it took some minutes to understand your way, because I didnt even recognize that this could be a way. The idea is quite simple (now, after I understand it), and seems to work for the most cases. Thanks for your help ! P.S.: It is quite interesting to see the differences between XSLT and XQuery. I should not insist on using only XQuery ;) -------- Original-Nachricht -------- > Datum: Fri, 19 Jun 2009 08:44:31 -0400 > Von: "David A. Lee" <http://x-query.com/mailman/listinfo/talk> > An: http://x-query.com/mailman/listinfo/talk > CC: http://x-query.com/mailman/listinfo/talk > Betreff: Re: XQuery - RegEx Pattern Matcher > > This may not be the best way to do it, but it works. and no regex. > This wont trap "quoted quotes" but it works on the example given and is > more readable then regexes to my caveman brain. > > > -------------------- > declare variable $data := > 'foo;bar;"spam;bletch";another;"set;of;embedded";strings'; > > declare function local:nextstr( $s as xs:string ) as xs:string? > { > if( starts-with($s , '"' ) ) then > concat( '"' , substring-before( substring( $s , 2 ) , '"' ) ,'"' ) > else > let $sep := substring-before( $s , ";" ) > return > if( $sep eq "" ) then $s else $sep > }; > > declare function local:splitcsv( $s as xs:string ) as xs:string* > { > if( string-length($s) eq 0 ) then > () > else > let $first := local:nextstr( $s ) > return > if( $first eq $s ) then > $s > else > ( $first , local:splitcsv( substring( $s , string-length($first) > + 2 ) )) > > > }; > > for $s in local:splitcsv( $data ) > return <tag>{$s}</tag> > ----------------------- > Returns > > <tag>foo</tag> > <tag>bar</tag> > <tag>"spam;bletch"</tag> > <tag>another</tag> > <tag>"set;of;embedded"</tag> > <tag>strings</tag> > > > > > David A. Lee > http://x-query.com/mailman/listinfo/talk > http://www.calldei.com > http://www.xmlsh.org > 812-482-5224 > > > > http://x-query.com/mailman/listinfo/talk wrote: > >> In XQuery think I would start by doing a replace() to replace > >> semicolons-not-within-quotes by some other delimiter (e.g. a PUA > >> character), > >> and then do a tokenize() to split the string on this new delimiter. > >> > > > > That is what I try to do ;) > > > > But, "...to replace semicolons-not-within-quotes..." needs Regex to find > those, or not ? > > > > > > > > > > > >>> -----Original Message----- > >>> From: http://x-query.com/mailman/listinfo/talk [mailto:http://x-query.com/mailman/listinfo/talk] > >>> Sent: 19 June 2009 11:29 > >>> To: Michael Kay; http://x-query.com/mailman/listinfo/talk > >>> Subject: Re: RE: XQuery - RegEx Pattern Matcher > >>> > >>> I am trying to "read" CSV data like this : > >>> > >>> one;"two;stilltwo";three;"four;stillfour";five > >>> > >>> this should resolve in something like this : > >>> ... > >>> <element>one</element> > >>> <element>two;stilltwo</element> > >>> <element>three</element> > >>> <element>four;stillfour</element> > >>> <element>five</element> > >>> ... > >>> > >>> if there is no separator(";") allowed within a text it is > >>> easy with just splitting a line with ";". > >>> > >>> But if there can be a ";" as a text, than I have to use RegEx. > >>> I succeded in finding a XQuery-RegEx if in one line there is > >>> only one case where a ";" is used as text. > >>> > >>> But I need to find every match, so I used the \\G . Worked > >>> fine, so I hoped to reuse it in XQuery... > >>> > >>> > >>> > >>> -------- Original-Nachricht -------- > >>> > >>>> Datum: Fri, 19 Jun 2009 10:20:30 +0100 > >>>> Von: "Michael Kay" <http://x-query.com/mailman/listinfo/talk> > >>>> An: http://x-query.com/mailman/listinfo/talk, http://x-query.com/mailman/listinfo/talk > >>>> Betreff: RE: XQuery - RegEx Pattern Matcher > >>>> > >>>> The XPath regular expression language does not recognize \G and it > >>>> does not recognize non-capturing groups. > >>>> > >>>> As far as matches() is concerned, there is no distinction between > >>>> capturing and non-capturing groups, so replace "(?:" by "(". > >>>> > >>>> I suspect you wanted your regex to contain "\G". In Java > >>>> > >>> you need to > >>> > >>>> escape this as "\\G"; in XPath/XQuery, backslash is not a special > >>>> character and does not need to be escaped. However, there's > >>>> > >>> no "\G" in > >>> > >>>> XPath regular expressions anyway. In Java it means "the end of the > >>>> previous match"; but XQuery is a functional language, so > >>>> > >>> "previous" is > >>> > >>>> meaningless. At this stage I give up because I'm not sure > >>>> > >>> what you are > >>> > >>>> trying to do: you haven't supplied enough of your code. > >>>> > >>>> Regards, > >>>> > >>>> Michael Kay > >>>> http://www.saxonica.com/ > >>>> http://twitter.com/michaelhkay > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: http://x-query.com/mailman/listinfo/talk > >>>>> [mailto:http://x-query.com/mailman/listinfo/talk] On Behalf Of http://x-query.com/mailman/listinfo/talk > >>>>> Sent: 19 June 2009 09:33 > >>>>> To: http://x-query.com/mailman/listinfo/talk > >>>>> Subject: XQuery - RegEx Pattern Matcher > >>>>> > >>>>> Hi, > >>>>> I am trying to use a RegEx within XQuery. In general that > >>>>> > >>> works fine. > >>> > >>>>> Now I have a more complex RegEx to work with CSV-files(these CSV > >>>>> have ";" as separator). > >>>>> I use can the following without problems in Java : > >>>>> > >>>>> Pattern Regex = Pattern.compile( > >>>>> "\\G(?:^|;)(?:\"((?:[^\"]|\"\")*)\"|([^\";]*))"); > >>>>> ... > >>>>> > >>>>> But in XQuery > >>>>> let $regularExpr :='\\G(?:^|;)(?:\"((?:[^\"]|\"\")*)\"|([^\";]*))' > >>>>> ... > >>>>> if (matches($row,$regularExpr) ) then ( ... > >>>>> > >>>>> just gives the error : > >>>>> > >>>>> Error at character 4 in regular expression > >>>>> "\\G(?:^|;)(?:\"((?:[^\"]|\"\")...": expected ()) > >>>>> > >>>>> > >>>>> I tried the optional flags (i, x, ...) but always with the same > >>>>> result... > >>>>> What is wrong with this RegEx ? > >>>>> > >>>>> P.S. :I run the XQuery from Java with Saxon. > >>>>> > >>>>> > >>>>> -- > >>>>> GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate und > >>>>> Telefonanschluss für nur 17,95 Euro/mtl.!* > >>>>> http://portal.gmx.net/de/go/dsl02 > >>>>> _______________________________________________ > >>>>> http://x-query.com/mailman/listinfo/talk > >>>>> http://x-query.com/mailman/listinfo/talk > >>>>> > >>>> _______________________________________________ > >>>> http://x-query.com/mailman/listinfo/talk > >>>> http://x-query.com/mailman/listinfo/talk > >>>> > >>> -- > >>> GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate und > >>> Telefonanschluss für nur 17,95 Euro/mtl.!* > >>> http://portal.gmx.net/de/go/dsl02 > >>> > >> _______________________________________________ > >> http://x-query.com/mailman/listinfo/talk > >> http://x-query.com/mailman/listinfo/talk > >> > > > > -- GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate und Telefonanschluss für nur 17,95 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02
|
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|






