re.sub(".*", ", "(replacement)", "text") doubles replacement on Python 3.7

Question

Welcome To Ask or Share your Answers For Others

re.sub(".*", ", "(replacement)", "text") doubles replacement on Python 3.7

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

re.sub(".*", ", "(replacement)", "text") doubles replacement on Python 3.7

On Python 3.7 (tested on Windows 64 bits), the replacement of a string using the RegEx .* gives the input string repeated twice!

On Python 3.7.2:

>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)(replacement)'

On Python 3.6.4:

>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)'

On Python 2.7.5 (32 bits):

>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)'

What is wrong? How to fix that?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:04:19+0000

This is not a bug, but a bug fix in Python 3.7 from the commit fbb490fd2f38bd817d99c20c05121ad0168a38ee.

In regex, a non-zero-width match moves the pointer position to the end of the match, so that the next assertion, zero-width or not, can continue to match from the position following the match. So in your example, after .* greedily matches and consumes the entire string, the fact that the pointer is then moved to the end of the string still actually leaves "room" for a zero-width match at that position, as can be evident from the following code, which behaves the same in Python 2.7, 3.6 and 3.7:

>>> re.findall(".*", 'sample text')
['sample text', '']

So the bug fix, which is about replacement of a zero-width match right after a non-zero-width match, now correctly replaces both matches with the replacement text.

Categories

re.sub(".*", ", "(replacement)", "text") doubles replacement on Python 3.7

re.sub(".*", ", "(replacement)", "text") doubles replacement on Python 3.7

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags