python - Remove duplicate chars using regex?

Question

Welcome To Ask or Share your Answers For Others

python - Remove duplicate chars using regex?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Remove duplicate chars using regex?

Let's say I want to remove all duplicate chars (of a particular char) in a string using regular expressions. This is simple -

import re
re.sub("a*", "a", "aaaa") # gives 'a'

What if I want to replace all duplicate chars (i.e. a,z) with that respective char? How do I do this?

import re
re.sub('[a-z]*', <what_to_put_here>, 'aabb') # should give 'ab'
re.sub('[a-z]*', <what_to_put_here>, 'abbccddeeffgg') # should give 'abcdefg'

NOTE: I know this remove duplicate approach can be better tackled with a hashtable or some O(n^2) algo, but I want to explore this using regexes

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T23:59:29+0000

>>> import re
>>> re.sub(r'([a-z])1+', r'1', 'ffffffbbbbbbbqqq')
'fbq'

The () around the [a-z] specify a capture group, and then the 1 (a backreference) in both the pattern and the replacement refer to the contents of the first capture group.

Thus, the regex reads "find a letter, followed by one or more occurrences of that same letter" and then entire found portion is replaced with a single occurrence of the found letter.

On side note...

Your example code for just a is actually buggy:

>>> re.sub('a*', 'a', 'aaabbbccc')
'abababacacaca'

You really would want to use 'a+' for your regex instead of 'a*', since the * operator matches "0 or more" occurrences, and thus will match empty strings in between two non-a characters, whereas the + operator matches "1 or more".

Categories

python - Remove duplicate chars using regex?

python - Remove duplicate chars using regex?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

On side note...

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags