php - Match a^n b^n c^n (e.g. "aaabbbccc") using regular expressions (PCRE)

Question

Welcome To Ask or Share your Answers For Others

php - Match a^n b^n c^n (e.g. "aaabbbccc") using regular expressions (PCRE)

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

php - Match a^n b^n c^n (e.g. "aaabbbccc") using regular expressions (PCRE)

It is a well known fact that modern regular expression implementations (most notably PCRE) have little in common with the original notion of regular grammars. For example you can parse the classical example of a context-free grammar {aⁿbⁿ; n>0} (e.g. aaabbb) using this regex (demo):

~^(a(?1)?b)$~

My question is: How far can you go? Is it also possible to parse the context-sensitive grammar {aⁿbⁿcⁿ;n>0} (e.g. aaabbbccc) using PCRE?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T22:32:19+0000

Inspired by NullUserExceptions answer (which he already deleted as it failed for one case) I think I have found a solution myself:

$regex = '~^
    (?=(a(?-1)?b)c)
     a+(b(?-1)?c)
$~x';

var_dump(preg_match($regex, 'aabbcc'));    // 1
var_dump(preg_match($regex, 'aaabbbccc')); // 1
var_dump(preg_match($regex, 'aaabbbcc'));  // 0
var_dump(preg_match($regex, 'aaaccc'));    // 0
var_dump(preg_match($regex, 'aabcc'));     // 0
var_dump(preg_match($regex, 'abbcc'));     // 0

Try it yourself: http://codepad.viper-7.com/1erq9v

Explanation

If you consider the regex without the positive lookahead assertion (the (?=...) part), you have this:

~^a+(b(?-1)?c)$~

This does nothing more than check that there's an arbitrary number of as, followed by an equal number of bs and cs.

This doesn't yet satisfy our grammar, because the number of as must be the same, too. We can ensure that by checking that the number of as equals the number of bs. And this is what the expression in the lookahead assertion does: (a(?-1)?b)c. The c is necessary so we don't only match a part of the bs.

Conclusion

I think this impressively shows that modern regex is not only capable of parsing non-regular grammars, but can even parse non-context-free grammars. Hopefully this will lay to rest the endless parroting of "you can't do X with regex because X isn't regular"

Categories

php - Match a^n b^n c^n (e.g. "aaabbbccc") using regular expressions (PCRE)

php - Match a^n b^n c^n (e.g. "aaabbbccc") using regular expressions (PCRE)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Explanation

Conclusion

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags