Root cause
Before Python 3.5, backreferences to failed capture groups in Python re.sub
were not populated with an empty string. Here is Bug 1519638 description at bugs.python.org. Thus, when using a backreference to a group that did not participate in the match resulted in an error.
There are two ways to fix that issue.
Solution 1: Adding empty alternatives to make optional groups obligatory
You can replace all optional capturing groups (those constructs like (d+)?
) with obligatory ones with an empty alternative (i.e. (d+|)
).
Here is an example of the failure:
import re
old = 'regexregex'
new = re.sub(r'regex(group)?regex', r'something1something', old)
print(new)
Replacing one line with
new = re.sub(r'regex(group|)regex', r'something1something', old)
It works.
Solution 2: Using lambda expression in the replacement and checking if the group is not None
This approach is necessary if you have optional groups inside another optional group.
You can use a lambda in the replacement part to check if the group is initialized, not None
, with lambda m: m.group(n) or ''
. Use this solution in your case, because you have two backreferences - #3 and #4 - in the replacement pattern, but some matches (see Match 1 and 3) do not have Capture group 3 initialized. It happens because the whole first part - (s*{{2}funcA(ka|)s*|s*([^}]*)s*}{2}s*|)
- is not participating in the match, and the inner Capture group 3 (i.e. ([^}]*)
) just does not get populated even after adding an empty alternative.
re.sub(r'(?i)(s*{{2}funcA(ka|)s*|s*([^}]*)s*}{2}s*|){{2}funcBs*|s*([^}]*)s*}{2}s*',
r"
| funcA"+str(n)+r" = 3
| funcB"+str(n)+r" = 4
| string"+str(n)+r" =
",
text,
count=1)
should be re-written with
re.sub(r'(?i)(s*{{funcA(ka|)s*|s*([^}]*)s*}}s*|){{funcBs*|s*([^}]*)s*}}s*',
lambda m: r"
| funcA"+str(n)+r" = " + (m.group(3) or '') + "
| funcB" + str(n) + r" = " + (m.group(4) or '') + "
| string" + str(n) + r" =
",
text,
count=1)
See IDEONE demo
import re
text = r'''
{{funcB|param1}}
*some string*
{{funcA|param2}}
{{funcB|param3}}
*some string2*
{{funcB|param4}}
*some string3*
{{funcAka|param5}}
{{funcB|param6}}
*some string4*
'''
for n in (range(1,(text.count('funcB')+1))):
text = re.sub(r'(?i)(s*{{2}funcA(ka|)s*|s*([^}]*)s*}{2}s*|){{2}funcBs*|s*([^}]*)s*}{2}s*',
lambda m: r"
| funcA"+str(n)+r" = "+(m.group(3) or '')+"
| funcB"+str(n)+r" = "+(m.group(4) or '')+"
| string"+str(n)+r" =
",
text,
count=1)
assert text == r'''
| funcA1 =
| funcB1 = param1
| string1 =
*some string*
| funcA2 = param2
| funcB2 = param3
| string2 =
*some string2*
| funcA3 =
| funcB3 = param4
| string3 =
*some string3*
| funcA4 = param5
| funcB4 = param6
| string4 =
*some string4*
'''
print 'ok'
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…