Just for fun here's a regex that will work with a single preg_match_all
:
'%(?:Filed under:s*+|G</a>)[^<>]*+<a[^<>]*+>K[^<>]*%`
Or, in a more readable format:
'%(?:
Filed under: # your sentinel string
|
G # NEXT MATCH POSITION
</a> # an end tag
)
[^<>]*+ # some non-tag stuff
<a[^<>]*+> # an opening tag
K # RESET MATCH START
[^<>]+ # the tag's contents
%x'
G
matches the position where the next match attempt would start, which is usually the spot where the previous successful match ended (but if the previous match was zero-length, it bumps ahead one more). That means the regex won't match a substring starting with </a>
until after it's matched one starting with Filed under:
at at least once.
After the sentinel string or an end tag has been matched, [^<>]*+<a[^<>]*+>
consumes everything up to and including the next start tag. Then K
spoofs the start position so the match (if there is one) appears to start after the <a>
tag (it's like a positive lookbehind, but more flexible). Finally, [^<>]+
matches the tag's contents and brings the match position up to the end tag so G
can match.
But, as I said, this is just for fun. If you don't have to do the job in one regex, you're better off with a multi-step approach like the one @codaddict used; it's more readable, more flexible, and more maintainable.
K
reference
G
reference
EDIT: Although the references I gave are for the Perl docs, these features are supported by PHP, too--or, more accurately, by the PCRE lib. I think the Perl docs are a little better, but you can also read about this stuff in the PCRE manual.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…