Understanding this problem requires understanding how NFA works under RegExp.
Elaborating the definition of NFA may be a mission too heavy for me. Search NFA on wiki it will gives you a better explanation. Here just think NFA is a robot finding patterns you give.
Crudely implemented NFA is somewhat dumb, it just looks ahead one or two tokens you give. So in the synthetic example you give, NFA just looks w+
at first (not parenthesis for grouping).
Because +
is a greedy quantifier, that is, matches as many characters as possible, so NFA dumbly continues to consume characters in target
. After 30 a
s, NFA encounters the end of string. After then does NFA realize that he needs to match other tokens in template
.
The next token is +
. NFA has matched it so it proceeds to .
. This time it fails.
What NFA does next is to step one character back, trying to match the pattern by truncating the submatching of w+
. So NFA split the target
in to two groups, 29 a
s for one w+
, and one trailing a
. NFA first tries to consume the trailing a by matching it against the second +
, but it still fails when NFA meeting .
. NFA continues the process above until it gets a full match, otherwise it will tries all possible partitions.
So (w+)+.
instructs NFA to group target
in such manner: target is partitioned into one or more groups, at least one character per group, and target is end with a period '.'. As long as the period is not matched. NFA tries all partitions possible. So how many partitions are there? 2^n, the exponential of 2. (JUst think inserting separator between a
). Like below
aaaaaaa a
aaaaaa aa
aaaaaa a a
.....
.......
aa a a ... a
a a a a a .... a
If NFA matches .
, it won't hurt much. But when it fails to match, this expression is doomed to be never-ending exponential .
I'm not advertising but Mastering Regular Expression is a good book to understand mechanism under RegExp.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…