Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
740 views
in Technique[技术] by (71.8m points)

regex - PHP Regular Expression - Repeating Match of a Group

I have a string that may look something like this:

$r = 'Filed under: <a>Group1</a>, <a>Group2</a>';

Here is the regular expression I am using so far:

preg_match_all("/Filed under: (?:<a.*?>([w|d|s]+?)</a>)+?/", $r, $matches);

I want the regular expression to inside the () to continue to make matches as designated with the +? at the end. But it just won't do it. ::sigh::

Any ideas. I know there has to be a way to do this in one regular expression instead of breaking it up.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Just for fun here's a regex that will work with a single preg_match_all:

'%(?:Filed under:s*+|G</a>)[^<>]*+<a[^<>]*+>K[^<>]*%`

Or, in a more readable format:

'%(?:
      Filed under:   # your sentinel string
    |                
      G             # NEXT MATCH POSITION
      </a>           # an end tag
  )
  [^<>]*+          # some non-tag stuff     
  <a[^<>]*+>       # an opening tag
  K               # RESET MATCH START
  [^<>]+           # the tag's contents
%x'

G matches the position where the next match attempt would start, which is usually the spot where the previous successful match ended (but if the previous match was zero-length, it bumps ahead one more). That means the regex won't match a substring starting with </a> until after it's matched one starting with Filed under: at at least once.

After the sentinel string or an end tag has been matched, [^<>]*+<a[^<>]*+> consumes everything up to and including the next start tag. Then K spoofs the start position so the match (if there is one) appears to start after the <a> tag (it's like a positive lookbehind, but more flexible). Finally, [^<>]+ matches the tag's contents and brings the match position up to the end tag so G can match.

But, as I said, this is just for fun. If you don't have to do the job in one regex, you're better off with a multi-step approach like the one @codaddict used; it's more readable, more flexible, and more maintainable.

K reference
G reference

EDIT: Although the references I gave are for the Perl docs, these features are supported by PHP, too--or, more accurately, by the PCRE lib. I think the Perl docs are a little better, but you can also read about this stuff in the PCRE manual.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...