Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
531 views
in Technique[技术] by (71.8m points)

python - Why is pandas.to_datetime slow for non standard time format such as '2014/12/31'

I have a .csv file in such format

timestmp, p
2014/12/31 00:31:01:9200, 0.7
2014/12/31 00:31:12:1700, 1.9
...

and when read via pd.read_csv and convert the time str to datetime using pd.to_datetime, the performance drops dramatically. Here is a minimal example.

import re
import pandas as pd

d = '2014-12-12 01:02:03.0030'
c = re.sub('-', '/', d)

%timeit pd.to_datetime(d)
%timeit pd.to_datetime(c)
%timeit pd.to_datetime(c, format="%Y/%m/%d %H:%M:%S.%f")

and the performances are:

10000 loops, best of 3: 62.4 μs per loop
10000 loops, best of 3: 181 μs per loop
10000 loops, best of 3: 82.9 μs per loop

so, how could I improve the performance of pd.to_datetime when reading date from a csv file?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is because pandas falls back to dateutil.parser.parse for parsing the strings when it has a non-default format or when no format string is supplied (this is much more flexible, but also slower).

As you have shown above, you can improve the performance by supplying a format string to to_datetime. Or another option is to use infer_datetime_format=True


Apparently, the infer_datetime_format cannot infer when there are microseconds. With an example without those, you can see a large speed-up:

In [28]: d = '2014-12-24 01:02:03'

In [29]: c = re.sub('-', '/', d)

In [30]: s_c = pd.Series([c]*10000)

In [31]: %timeit pd.to_datetime(s_c)
1 loops, best of 3: 1.14 s per loop

In [32]: %timeit pd.to_datetime(s_c, infer_datetime_format=True)
10 loops, best of 3: 105 ms per loop

In [33]: %timeit pd.to_datetime(s_c, format="%Y/%m/%d %H:%M:%S")
10 loops, best of 3: 99.5 ms per loop

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...