Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
731 views
in Technique[技术] by (71.8m points)

regex - Perl regular expression: match nested brackets

I'm trying to match nested {} brackets with a regular expressions in Perl so that I can extract certain pieces of text from a file. This is what I have currently:

my @matches = $str =~ /{(?:{.*}|[^{])*}|w+/sg;

foreach (@matches) {
    print "$_
";
}

At certain times this works as expected. For instance, if $str = "abc {{xyz} abc} {xyz}" I obtain:

abc
{{xyz} abc}
{xyz}

as expected. But for other input strings it does not function as expected. For example, if $str = "{abc} {{xyz}} abc", the output is:

{abc} {{xyz}}
abc

which is not what I expected. I would have wanted {abc} and {{xyz}} to be on separate lines, since each is balanced on its own in terms of brackets. Is there an issue with my regular expression? If so, how would I go about fixing it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You were surprised how your pattern matched, but noone explained it? Here's how your pattern is matching:

my @matches = $str =~ /{(?:{.*}|[^{])*}|w+/sg;
                       ^    ^ ^ ^  ^      ^
                       |    | | |  |      |
{ ---------------------+    | | |  |      |
a --------------------------)-)-)--+      |
b --------------------------)-)-)--+      |
c --------------------------)-)-)--+      |
} --------------------------)-)-)--+      |
  --------------------------)-)-)--+      |
{ --------------------------+ | |         |
{ ----------------------------+ |         |
x ----------------------------+ |         |
y ----------------------------+ |         |
z ----------------------------+ |         |
} ------------------------------+         |
} ----------------------------------------+

As you can see, the problem is that /{.*}/ matches too much. What should be in there is a something that matches

(?: s* (?: { ... } | w+ ) )*

where the ... is

(?: s* (?: { ... } | w+ ) )*

So you need some recursion. Named groups are an easy way of doing this.

say $1
   while /
      G s*+ ( (?&WORD) | (?&BRACKETED) )

      (?(DEFINE)
         (?<WORD>      s* w+ )
         (?<BRACKETED> s* { (?&TEXT)? s* } )
         (?<TEXT>      (?: (?&WORD) | (?&BRACKETED) )+ )
      )
   /xg;

But instead of reinventing the wheel, why not use Text::Balanced.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...