Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
224 views
in Technique[技术] by (71.8m points)

python - Empty string instead of unmatched group error

I have this piece of code:

for n in (range(1,10)):
    new = re.sub(r'(regex(group)regex)?regex', r'something'+str(n)+r'1', old, count=1)

It throws the unmatched group error. But if it is unmatched, I want to add empty string there instead of throwing an error. How could I achieve this?

Note: My full code is much more complicated than this example. But if you find out better solution how to iterate over matches and add number inside, you could share. My full code:

for n in (range(1,(text.count('soutez')+1))):
    text = re.sub(r'(?i)(s*{{2}infobox medaile reprezentant(ka)?s*|s*([^}]*)s*}{2}s*)?{{2}infobox medaile soutezs*|s*([^}]*)s*}{2}s*', r"
 | reprezentace"+str(n)+r" = 3
 | soutez"+str(n)+r" = 4
 | medaile"+str(n)+r" = 
", text, count=1)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Root cause

Before Python 3.5, backreferences to failed capture groups in Python re.sub were not populated with an empty string. Here is Bug 1519638 description at bugs.python.org. Thus, when using a backreference to a group that did not participate in the match resulted in an error.

There are two ways to fix that issue.

Solution 1: Adding empty alternatives to make optional groups obligatory

You can replace all optional capturing groups (those constructs like (d+)?) with obligatory ones with an empty alternative (i.e. (d+|)).

Here is an example of the failure:

import re
old = 'regexregex'
new = re.sub(r'regex(group)?regex', r'something1something', old)
print(new)

Replacing one line with

new = re.sub(r'regex(group|)regex', r'something1something', old)

It works.

Solution 2: Using lambda expression in the replacement and checking if the group is not None

This approach is necessary if you have optional groups inside another optional group.

You can use a lambda in the replacement part to check if the group is initialized, not None, with lambda m: m.group(n) or ''. Use this solution in your case, because you have two backreferences - #3 and #4 - in the replacement pattern, but some matches (see Match 1 and 3) do not have Capture group 3 initialized. It happens because the whole first part - (s*{{2}funcA(ka|)s*|s*([^}]*)s*}{2}s*|) - is not participating in the match, and the inner Capture group 3 (i.e. ([^}]*)) just does not get populated even after adding an empty alternative.

re.sub(r'(?i)(s*{{2}funcA(ka|)s*|s*([^}]*)s*}{2}s*|){{2}funcBs*|s*([^}]*)s*}{2}s*', 
r"
 | funcA"+str(n)+r" = 3
 | funcB"+str(n)+r" = 4
 | string"+str(n)+r" = 
", 
text, 
count=1)

should be re-written with

re.sub(r'(?i)(s*{{funcA(ka|)s*|s*([^}]*)s*}}s*|){{funcBs*|s*([^}]*)s*}}s*', 
lambda m: r"
 | funcA"+str(n)+r" = " + (m.group(3) or '') + "
 | funcB" + str(n) + r" = " + (m.group(4) or '') + "
 | string" + str(n) + r" = 
", 
text, 
count=1)  

See IDEONE demo

import re
 
text = r'''
 
{{funcB|param1}}
*some string*
{{funcA|param2}}
{{funcB|param3}}
*some string2*
 
{{funcB|param4}}
*some string3*
{{funcAka|param5}}
{{funcB|param6}}
*some string4*
'''
 
for n in (range(1,(text.count('funcB')+1))):
    text = re.sub(r'(?i)(s*{{2}funcA(ka|)s*|s*([^}]*)s*}{2}s*|){{2}funcBs*|s*([^}]*)s*}{2}s*', 
    lambda m: r"
 | funcA"+str(n)+r" = "+(m.group(3) or '')+"
 | funcB"+str(n)+r" = "+(m.group(4) or '')+"
 | string"+str(n)+r" = 
", 
    text, 
    count=1) 
    
assert text == r'''
| funcA1 =
| funcB1 = param1
| string1 =
*some string*
| funcA2 = param2
| funcB2 = param3
| string2 =
*some string2*
| funcA3 =
| funcB3 = param4
| string3 =
*some string3*
| funcA4 = param5
| funcB4 = param6
| string4 =
*some string4*
'''
print 'ok'

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...