>>> import re
>>> re.sub(r'([a-z])1+', r'1', 'ffffffbbbbbbbqqq')
'fbq'
The ()
around the [a-z]
specify a capture group, and then the 1
(a backreference) in both the pattern and the replacement refer to the contents of the first capture group.
Thus, the regex reads "find a letter, followed by one or more occurrences of that same letter" and then entire found portion is replaced with a single occurrence of the found letter.
On side note...
Your example code for just a
is actually buggy:
>>> re.sub('a*', 'a', 'aaabbbccc')
'abababacacaca'
You really would want to use 'a+'
for your regex instead of 'a*'
, since the *
operator matches "0 or more" occurrences, and thus will match empty strings in between two non-a
characters, whereas the +
operator matches "1 or more".
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…