I have a .csv file in such format
timestmp, p
2014/12/31 00:31:01:9200, 0.7
2014/12/31 00:31:12:1700, 1.9
...
and when read via pd.read_csv
and convert the time str to datetime using pd.to_datetime
, the performance drops dramatically. Here is a minimal example.
import re
import pandas as pd
d = '2014-12-12 01:02:03.0030'
c = re.sub('-', '/', d)
%timeit pd.to_datetime(d)
%timeit pd.to_datetime(c)
%timeit pd.to_datetime(c, format="%Y/%m/%d %H:%M:%S.%f")
and the performances are:
10000 loops, best of 3: 62.4 μs per loop
10000 loops, best of 3: 181 μs per loop
10000 loops, best of 3: 82.9 μs per loop
so, how could I improve the performance of pd.to_datetime
when reading date from a csv file?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…