Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
269 views
in Technique[技术] by (71.8m points)

java - Regular expression: who's greedier?

My primary concern is with the Java flavor, but I'd also appreciate information regarding others.

Let's say you have a subpattern like this:

(.*)(.*)

Not very useful as is, but let's say these two capture groups (say, 1 and 2) are part of a bigger pattern that matches with backreferences to these groups, etc.

So both are greedy, in that they try to capture as much as possible, only taking less when they have to.

My question is: who's greedier? Does 1 get first priority, giving 2 its share only if it has to?

What about:

(.*)(.*)(.*)

Let's assume that 1 does get first priority. Let's say it got too greedy, and then spit out a character. Who gets it first? Is it always 2 or can it be 3?

Let's assume it's 2 that gets 1's rejection. If this still doesn't work, who spits out now? Does 2 spit to 3, or does 1 spit out another to 2 first?


Bonus question

What happens if you write something like this:

(.*)(.*?)(.*)

Now 2 is reluctant. Does that mean 1 spits out to 3, and 2 only reluctantly accepts 3's rejection?


Example

Maybe it was a mistake for me not to give concrete examples to show how I'm using these patterns, but here's some:

System.out.println(
    "OhMyGod=MyMyMyOhGodOhGodOhGod"
    .replaceAll("^(.*)(.*)(.*)=(\1|\2|\3)+$", "<$1><$2><$3>")
); // prints "<Oh><My><God>"

// same pattern, different input string
System.out.println(
    "OhMyGod=OhMyGodOhOhOh"
    .replaceAll("^(.*)(.*)(.*)=(\1|\2|\3)+$", "<$1><$2><$3>")
); // prints "<Oh><MyGod><>"

// now 2 is reluctant
System.out.println(
    "OhMyGod=OhMyGodOhOhOh"
    .replaceAll("^(.*)(.*?)(.*)=(\1|\2|\3)+$", "<$1><$2><$3>")
); // prints "<Oh><><MyGod>"
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

1 will have priority, 2 and 3 will always match nothing. 2 will then have priority over 3.

As a general rule think of it like this, back-tracking will only occur to satisfy a match, it will not occur to satisfy greediness, so left is best :)

explaining back tracking and greediness is to much for me to tackle here, i'd suggest friedl's Mastering Regular Expressions


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...