[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message]

Re: [OT] bugs in JDK regex engine ?

  • From: "Mukul Gandhi" <gandhi.mukul@g...>
  • To: xml-dev@l...
  • Date: Mon, 4 Feb 2008 10:00:36 +0530

Re:  [OT] bugs in JDK regex engine ?
Thanks for your reply, and help.

My present problem is resolved.

I'll try few bit complex use cases, and post my questions ...

On Feb 4, 2008 9:47 AM, Amelia A Lewis <amyzing@t...> wrote:
> On 2008-02-03 23:26:58 -0500 "Mukul Gandhi" <gandhi.mukul@g...>
> wrote:
> > String str = "<root><abc x='1'>text1</abc><pqr
> > y='1'>text2</pqr></root>";
> >
> > Pattern pattern = Pattern.compile("<[^/]+>");  //anything from '<' to
> > '>', and not having '/'
> > Matcher matcher = pattern.matcher(str);
> >
> > while (matcher.find()) {
> >    String group = matcher.group();
> >    System.out.println(group);
> > }
> >
> > 'str' is a String representation of a XML fragment.
> >
> > I want to extract all pieces from the string (the tokens), which form
> > a start tag (including attribute parts).
> >
> > I am expecting output:
> > <root>
> > <abc x='1'>
> > <pqr y='1'>
>
> But that's not what you asked for.  You said "longest string starting
> with '<' and ending with '>' that doesn't contain '/'.
>
> > But the output produced by the above program is:
> > <root><abc x='1'>
> > <pqr y='1'>
>
> Yup.  Exactly matches the regex.  No / in either one, is there?
> Specifically, even though you think you asked for "just the start
> tag," you have <abc> nested inside <root>; there's no / anywhere
> around to prevent the regex from matching to the end of <abc>
>
> The problem with using regular expressions to parse any grammar with
> paired tokens (XML for example, but also most programming languages
> with paired braces of any sort, or comments in a language that permits
> comment nesting) is that regular expressions can't handle parity.
>
> You need something more powerful than regex.
>
> If you're determined to find the next layer of problems associated
> with using a too-weak tool to do the job, you should find it shortly
> after making this change:
>
> Pattern.compile("<[^/<]+>");
>
> That prevents it from picking up a nested element tag.  Most of the
> time.
>
> For giggles:
>
> <root><?my-pi wotsit ?><abc x='1'><![CDATA[<?xml version="1.0?>
> <root><abc x='1'>text1]]></abc>
> </root>
>
> HTH.
>
> Amy!
> --
> Amelia A. Lewis                    amyzing {at} talsever.com
> Confidence: a feeling peculiar to the stage just before full
> comprehension of the problem.


-- 
Regards,
Mukul Gandhi


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


PURCHASE STYLUS STUDIO ONLINE TODAY!

Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced!

Buy Stylus Studio Now

Download The World's Best XML IDE!

Accelerate XML development with our award-winning XML IDE - Download a free trial today!

Don't miss another message! Subscribe to this list today.
Email
First Name
Last Name
Company
Subscribe in XML format
RSS 2.0
Atom 0.3
 

Stylus Studio has published XML-DEV in RSS and ATOM formats, enabling users to easily subcribe to the list from their preferred news reader application.


Stylus Studio Sponsored Links are added links designed to provide related and additional information to the visitors of this website. they were not included by the author in the initial post. To view the content without the Sponsor Links please click here.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member
Stylus Studio® and DataDirect XQuery ™are products from DataDirect Technologies, is a registered trademark of Progress Software Corporation, in the U.S. and other countries. © 2004-2013 All Rights Reserved.