Re: [OT] bugs in JDK regex engine ?

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

From: "Mukul Gandhi" <gandhi.mukul@g...>
To: xml-dev@l...
Date: Mon, 4 Feb 2008 09:27:58 +0530

Thanks Mike, for your comments.

Below is a simple example I tried with JDK 1.6.0.

String str = "<root><abc x='1'>text1</abc><pqr y='1'>text2</pqr></root>";

Pattern pattern = Pattern.compile("<[^/]+>");  //anything from '<' to
'>', and not having '/'
Matcher matcher = pattern.matcher(str);

while (matcher.find()) {
   String group = matcher.group();
   System.out.println(group);
}

'str' is a String representation of a XML fragment.

I want to extract all pieces from the string (the tokens), which form
a start tag (including attribute parts).

I am expecting output:
<root>
<abc x='1'>
<pqr y='1'>

But the output produced by the above program is:
<root><abc x='1'>
<pqr y='1'>

You could notice, that the 1st token is larger ...

Can you or anybody please help ...

On Feb 3, 2008 10:52 PM, Michael Kay <mike@s...> wrote:
> Saxon translates XML Schema and XPath regexes into JDK regexes, so it's
> pretty heavily dependent on the underlying regex engine. There are some
> cases where the behaviour is very incompletely specified, for example the
> effect of the "i" (case-blind) flag, but I've found very few cases where the
> expected behaviour is clear and the actual behaviour differs. In my
> experience, it's much more likely to be a user error.
>
> However, I think it might be stretching the (highly elastic) patience of
> this list to hold a discussion of JDK regex behaviour here.
>
> In any case, I think the whole concept of checking XML well-formedness using
> regular expressions is misguided, for the simple reason that (on theoretical
> grounds) regular expressions aren't up to the job.
>
> Michael Kay
> http://www.saxonica.com/


-- 
Regards,
Mukul Gandhi

Follow-Ups:
- RE: [OT] bugs in JDK regex engine ?
  - From: "Michael Kay" <mike@s...>
- Re: [OT] bugs in JDK regex engine ?
  - From: Amelia A Lewis <amyzing@t...>

References:
- [OT] bugs in JDK regex engine ?
  - From: "Mukul Gandhi" <gandhi.mukul@g...>
- RE: [OT] bugs in JDK regex engine ?
  - From: "Michael Kay" <mike@s...>

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >