regex - Why/how is an additional variable needed in matching repeated arbitary character with capture groups?

Question

Welcome To Ask or Share your Answers For Others

regex - Why/how is an additional variable needed in matching repeated arbitary character with capture groups?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - Why/how is an additional variable needed in matching repeated arbitary character with capture groups?

I'm matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.

After reading through https://docs.perl6.org/language/regexes#Capture_numbers and tweaking the example given, I've come up with this code using an 'external variable':

#uses an additional variable $c
perl6 -e '$_="bbaaaaawer"; /((.){} :my $c=$0; ($c)**2..*)/ && print $0';

#Output:  aaaaa

To aid in illustrating my question only, a similar regex in perl5:

#No additional variable needed
perl -e ' $_="bbaaaaawer"; /((.)2{2,})/ && print $1';

Could someone enlighten me on the need/benefit of 'saving' $0 into $c and the requirement of the empty {}? Is there an alternative (better/golfed) perl6 regex that will match?

Thanks in advance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

...

深蓝 · Answer 1 · 2021-10-23T21:17:23+0000

Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.

This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:

say "abab" ~~ /((a)(b))+/

Then the result is:

?abab?
 0 => ?ab?
  0 => ?a?
  1 => ?b?
 0 => ?ab?
  0 => ?a?
  1 => ?b?

And we can then index:

say $0;        # The array of the top-level capture, which was quantified
say $0[1];     # The second Match
say $0[1][0];  # The first Match within that Match object (the (a))

It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.

Categories

regex - Why/how is an additional variable needed in matching repeated arbitary character with capture groups?

regex - Why/how is an additional variable needed in matching repeated arbitary character with capture groups?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags