Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

string - Why is str.translate much faster in Python 3.5 compared to Python 3.4?

I was trying to remove unwanted characters from a given string using text.translate() in Python 3.4.

The minimal code is:

import sys 
s = 'abcde12345@#@$#%$'
mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$')
print(s.translate(mapper))

It works as expected. However the same program when executed in Python 3.4 and Python 3.5 gives a large difference.

The code to calculate timings is

python3 -m timeit -s "import sys;s = 'abcde12345@#@$#%$'*1000 ; mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$'); "   "s.translate(mapper)"

The Python 3.4 program takes 1.3ms whereas the same program in Python 3.5 takes only 26.4μs.

What has improved in Python 3.5 that makes it faster compared to Python 3.4?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

TL;DR - ISSUE 21118


The long Story

Josh Rosenberg found out that the str.translate() function is very slow compared to the bytes.translate, he raised an issue, stating that:

In Python 3, str.translate() is usually a performance pessimization, not optimization.

Why was str.translate() slow?

The main reason for str.translate() to be very slow was that the lookup used to be in a Python dictionary.

The usage of maketrans made this problem worse. The similar approach using bytes builds a C array of 256 items to fast table lookup. Hence the usage of higher level Python dict makes the str.translate() in Python 3.4 very slow.

What happened now?

The first approach was to add a small patch, translate_writer, However the speed increase was not that pleasing. Soon another patch fast_translate was tested and it yielded very nice results of up to 55% speedup.

The main change as can be seen from the file is that the Python dictionary lookup is changed into a C level lookup.

The speeds now are almost the same as bytes

                                unpatched           patched

str.translate                   4.55125927699919    0.7898181750006188
str.translate from bytes trans  1.8910855210015143  0.779950579000797

A small note here is that the performance enhancement is only prominent in ASCII strings.

As J.F.Sebastian mentions in a comment below, Before 3.5, translate used to work in the same way for both ASCII and non-ASCII cases. However from 3.5 ASCII case is much faster.

Earlier ASCII vs non-ascii used to be almost same, however now we can see a great change in the performance.

It can be an improvement from 71.6μs to 2.33μs as seen in this answer.

The following code demonstrates this

python3.5 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
100000 loops, best of 3: 2.3 usec per loop
python3.5 -m timeit -s "text = 'mU0001F602ssissippi'*100; d={'U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 117 usec per loop

python3 -m timeit -s "text = 'mU0001F602ssissippi'*100; d={'U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 91.2 usec per loop
python3 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
10000 loops, best of 3: 101 usec per loop

Tabulation of the results:

         Python 3.4    Python 3.5  
Ascii     91.2          2.3 
Unicode   101           117

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...