[Home] [By Thread] [By Date] [Recent Entries]
Thanks Mike, for your comments.
Below is a simple example I tried with JDK 1.6.0.
String str = "<root><abc x='1'>text1</abc><pqr y='1'>text2</pqr></root>";
Pattern pattern = Pattern.compile("<[^/]+>"); //anything from '<' to
'>', and not having '/'
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
String group = matcher.group();
System.out.println(group);
}
'str' is a String representation of a XML fragment.
I want to extract all pieces from the string (the tokens), which form
a start tag (including attribute parts).
I am expecting output:
<root>
<abc x='1'>
<pqr y='1'>
But the output produced by the above program is:
<root><abc x='1'>
<pqr y='1'>
You could notice, that the 1st token is larger ...
Can you or anybody please help ...
On Feb 3, 2008 10:52 PM, Michael Kay <mike@s...> wrote:
> Saxon translates XML Schema and XPath regexes into JDK regexes, so it's
> pretty heavily dependent on the underlying regex engine. There are some
> cases where the behaviour is very incompletely specified, for example the
> effect of the "i" (case-blind) flag, but I've found very few cases where the
> expected behaviour is clear and the actual behaviour differs. In my
> experience, it's much more likely to be a user error.
>
> However, I think it might be stretching the (highly elastic) patience of
> this list to hold a discussion of JDK regex behaviour here.
>
> In any case, I think the whole concept of checking XML well-formedness using
> regular expressions is misguided, for the simple reason that (on theoretical
> grounds) regular expressions aren't up to the job.
>
> Michael Kay
> http://www.saxonica.com/
--
Regards,
Mukul Gandhi
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



