[XML-DEV Mailing List Archive Home] [By Thread] [By Date] [Recent Entries] [Reply To This Message] Re: [OT] bugs in JDK regex engine ?
Thanks for your reply, and help. My present problem is resolved. I'll try few bit complex use cases, and post my questions ... On Feb 4, 2008 9:47 AM, Amelia A Lewis <amyzing@t...> wrote: > On 2008-02-03 23:26:58 -0500 "Mukul Gandhi" <gandhi.mukul@g...> > wrote: > > String str = "<root><abc x='1'>text1</abc><pqr > > y='1'>text2</pqr></root>"; > > > > Pattern pattern = Pattern.compile("<[^/]+>"); //anything from '<' to > > '>', and not having '/' > > Matcher matcher = pattern.matcher(str); > > > > while (matcher.find()) { > > String group = matcher.group(); > > System.out.println(group); > > } > > > > 'str' is a String representation of a XML fragment. > > > > I want to extract all pieces from the string (the tokens), which form > > a start tag (including attribute parts). > > > > I am expecting output: > > <root> > > <abc x='1'> > > <pqr y='1'> > > But that's not what you asked for. You said "longest string starting > with '<' and ending with '>' that doesn't contain '/'. > > > But the output produced by the above program is: > > <root><abc x='1'> > > <pqr y='1'> > > Yup. Exactly matches the regex. No / in either one, is there? > Specifically, even though you think you asked for "just the start > tag," you have <abc> nested inside <root>; there's no / anywhere > around to prevent the regex from matching to the end of <abc> > > The problem with using regular expressions to parse any grammar with > paired tokens (XML for example, but also most programming languages > with paired braces of any sort, or comments in a language that permits > comment nesting) is that regular expressions can't handle parity. > > You need something more powerful than regex. > > If you're determined to find the next layer of problems associated > with using a too-weak tool to do the job, you should find it shortly > after making this change: > > Pattern.compile("<[^/<]+>"); > > That prevents it from picking up a nested element tag. Most of the > time. > > For giggles: > > <root><?my-pi wotsit ?><abc x='1'><![CDATA[<?xml version="1.0?> > <root><abc x='1'>text1]]></abc> > </root> > > HTH. > > Amy! > -- > Amelia A. Lewis amyzing {at} talsever.com > Confidence: a feeling peculiar to the stage just before full > comprehension of the problem. -- Regards, Mukul Gandhi
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |
PURCHASE STYLUS STUDIO ONLINE TODAY!Purchasing Stylus Studio from our online shop is Easy, Secure and Value Priced! Download The World's Best XML IDE!Accelerate XML development with our award-winning XML IDE - Download a free trial today! Subscribe in XML format
|