Python does not directly support this feature, but you can emulate it by using a zero-width lookahead assert ((?=RE)
), which matches from the current point with the same semantics you want, putting a named group ((?P<name>RE)
) inside the lookahead, and then using a named backreference ((?P=name)
) to match exactly whatever the zero-width assertion matched. Combined together, this gives you the same semantics, at the cost of creating an additional matching group, and a lot of syntax.
For example, the link you provided gives the Ruby example of
/"(?>.*)"/.match('"Quote"') #=> nil
We can emulate that in Python as such:
re.search(r'"(?=(?P<tmp>.*))(?P=tmp)"', '"Quote"') # => None
We can show that I'm doing something useful and not just spewing line noise, because if we change it so that the inner group doesn't eat the final "
, it still matches:
re.search(r'"(?=(?P<tmp>[A-Za-z]*))(?P=tmp)"', '"Quote"').groupdict()
# => {'tmp': 'Quote'}
You can also use anonymous groups and numeric backreferences, but this gets awfully full of line-noise:
re.search(r'"(?=(.*))1"', '"Quote"') # => None
(Full disclosure: I learned this trick from perl's perlre
documentation, which mentions it under the documentation for (?>...)
.)
In addition to having the right semantics, this also has the appropriate performance properties. If we port an example out of perlre
:
[nelhage@anarchique:~/tmp]$ cat re.py
import re
import timeit
re_1 = re.compile(r'''(
(
[^()]+ # x+
|
( [^()]* )
)+
)
''', re.X)
re_2 = re.compile(r'''(
(
(?=(?P<tmp>[^()]+ ))(?P=tmp) # Emulate (?> x+)
|
( [^()]* )
)+
)''', re.X)
print timeit.timeit("re_1.search('((()' + 'a' * 25)",
setup = "from __main__ import re_1",
number = 10)
print timeit.timeit("re_2.search('((()' + 'a' * 25)",
setup = "from __main__ import re_2",
number = 10)
We see a dramatic improvement:
[nelhage@anarchique:~/tmp]$ python re.py
96.0800571442
7.41481781006e-05
Which only gets more dramatic as we extend the length of the search string.