Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
799 views
in Technique[技术] by (71.8m points)

python regex error: unbalanced parenthesis

I pretty new to python, so i have a dictionary with some keys in it, and a string. I have to replace the string if a pattern found in the dictionary exists in the string. both the dictionary and string are very large. I'm using a regex to find the patterns.

It all works fine until a key like this pops up '-(' or this '(-)' in which case python gives an error for unbalanced parenthesis.

Here's how the code I've written looks:

somedict={'-(':'value1','(-)':'value2'}
somedata='this is some data containing -( and (-)'
for key in somedict.iterkeys():
    somedata=re.sub(key, 'newvalue', somedata)

Here's the error I've got in the console

Traceback (most recent call last):
  File "<console>", line 2, in <module>
  File "C:Python27lib
e.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:Python27lib
e.py", line 244, in _compile
    raise error, v # invalid expression
error: unbalanced parenthesis

I've also tried it many ways using the regex compiler and searched a lot but didn't find anything addressing the problem. Any help is appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You need to escape the key using re.escape():

somedata = re.sub(re.escape(key), 'newvalue', somedata)

otherwise the contents will be interpreted as a regular expression.

You are not using regular expressions at all here, so you may as well just use:

somedata = somedata.replace(key, 'newvalue')

If you wanted to replace only whole words (so with whitespace or punctuation markes around them, at the start or end of the input string), you need to some kind of boundary anchors, at which point it makes sense to use regular expressions. If all you have are alphanumeric words (plus underscores), would work:

somedata = re.sub(r'{}'.format(re.escape(key)), 'newvalue', somedata)

This puts before and after the string you wanted to replace, so that baz in foo baz bar is changed, but foo bazbaz bar is not.

For input that involves non-alphanumeric 'words', you'd need to match whitespace-or-start and whitespace-or-end anchors with look-aheads and look-behinds:

somedata = re.sub(r'(?:^|(?<=s)){}(?:$|(?=s))'.format(re.escape(key)), 'newvalue', somedata)

Here the pattern (?:^|(?<=s)) uses two anchors, the start-of-string anchor and a look-behind assertion, to match the places where there is either the start of the string or a space immediately to the left. Similarly (?:$|(?=s) does the same for the other end, matching the end of the string or a position followed by a space.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...