Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
245 views
in Technique[技术] by (71.8m points)

java - Understanding Quantifiers

I was going through the Java Tutorial on Quantifiers.

There is a difference mentioned among Differences Among Greedy, Reluctant, and Possessive Quantifiers.

I am not able to understand what's the difference exactly.

Explanation provided as follows:

Enter your regex: .*foo  // greedy quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo  // reluctant quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // possessive quantifier
Enter input string to search: xfooxxxxxxfoo
No match found.

The first example uses the greedy quantifier .* to find "anything", zero or more times, followed by the letters "f" "o" "o". Because the quantifier is greedy, the .* portion of the expression first eats the entire input string. At this point, the overall expression cannot succeed, because the last three letters ("f" "o" "o") have already been consumed. So the matcher slowly backs off one letter at a time until the rightmost occurrence of "foo" has been regurgitated, at which point the match succeeds and the search ends.

The second example, however, is reluctant, so it starts by first consuming "nothing". Because "foo" doesn't appear at the beginning of the string, it's forced to swallow the first letter (an "x"), which triggers the first match at 0 and 4. Our test harness continues the process until the input string is exhausted. It finds another match at 4 and 13.

The third example fails to find a match because the quantifier is possessive. In this case, the entire input string is consumed by .*+, leaving nothing left over to satisfy the "foo" at the end of the expression. Use a possessive quantifier for situations where you want to seize all of something without ever backing off; it will outperform the equivalent greedy quantifier in cases where the match is not immediately found.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The main difference between lazy (reluctant) & greedy case is that the behaviour of the backtracking structure, and the possessive case is just way too aggressive!

  • Lazy case will always, after a single match, will relinquish the focus of the matching engine to the next operator in the regex. If next operator fails, the backtrack structure will force lazy case to be repeated, and this continues until either operators or target text ends; for example in your example passing to the matching of char f after every successful match, so whenever you had a foo phrase, you'd get a match, that is why we get multiple matches from its usage.

.*?foo

xfooxxxxxxfoo
At the beginning, lazy case will have a successful match with the x (after successful empty match) and pass the focus to the next operator; foo part of the regex, and since that is present after x, we get a match for this fragment, same idea for the secondary part of the string.

  • Greedy case is the opposite, will continue its matching until it fails, never passes the focus to the next operator, only when the matching fails the backtracking goes into effect, and the next operators will be matched from reverse;

.*foo

xfooxxxxxxfoo
When the greedy case is at this point (last char) the matching will fail since we couldn't match the foo part of the regex. Than the backtracking will force the greedy case to backtrace its steps and enforce the next operator foo, similar of lazy case;

xfooxxxxxxfoo
At this point, the foo part will get a successful match, and thus ending with a successful match of the whole string.

  • Possessive case is very similar to greedy case, except the last part where the failure of match causing a backtracking, this is not the case for possessive. If it can be matched, it will posses and will sacrifice the success of the match in the process. If it would've failed in matching a character, only then the focus would pass to the next operator of the regex.

.*+foo

xfooxxxxxxfoo
Similar of greedy case, we have reached to the end of the string, but possessive case can still match it, thus will not pass the torch to the backtrack structure, and will cause a matching failure.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...