Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
318 views
in Technique[技术] by (71.8m points)

python - Most efficient way to remove multiple substrings from string?

What's the most efficient method to remove a list of substrings from a string?

I'd like a cleaner, quicker way to do the following:

words = 'word1 word2 word3 word4, word5'
replace_list = ['word1', 'word3', 'word5']

def remove_multiple_strings(cur_string, replace_list):
  for cur_word in replace_list:
    cur_string = cur_string.replace(cur_word, '')
  return cur_string

remove_multiple_strings(words, replace_list)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Regex:

>>> import re
>>> re.sub(r'|'.join(map(re.escape, replace_list)), '', words)
' word2  word4, '

The above one-liner is actually not as fast as your string.replace version, but definitely shorter:

>>> words = ' '.join([hashlib.sha1(str(random.random())).hexdigest()[:10] for _ in xrange(10000)])
>>> replace_list = words.split()[:1000]
>>> random.shuffle(replace_list)
>>> %timeit remove_multiple_strings(words, replace_list)
10 loops, best of 3: 49.4 ms per loop
>>> %timeit re.sub(r'|'.join(map(re.escape, replace_list)), '', words)
1 loops, best of 3: 623 ms per loop

Gosh! Almost 12x slower.

But can we improve it? Yes.

As we are only concerned with words what we can do is simply filter out words from the words string using w+ and compare it against a set of replace_list(yes an actual set: set(replace_list)):

>>> def sub(m):
    return '' if m.group() in s else m.group()
>>> %%timeit
s = set(replace_list)
re.sub(r'w+', sub, words)
...
100 loops, best of 3: 7.8 ms per loop

For even larger string and words the string.replace approach and my first solution will end up taking quadratic time, but the solution should run in linear time.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...