I need to process a huge amount of CSV files where the time stamp is always a string representing the unix timestamp in milliseconds. I could not find a method yet to modify these columns efficiently.
This is what I came up with, however this of course duplicates only the column and I have to somehow put it back to the original dataset. I'm sure it can be done when creating the DataFrame
?
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
data = 'RUN,UNIXTIME,VALUE
1,1447160702320,10
2,1447160702364,20
3,1447160722364,42'
df = pd.read_csv(StringIO(data))
convert = lambda x: datetime.datetime.fromtimestamp(x / 1e3)
converted_df = df['UNIXTIME'].apply(convert)
This will pick the column 'UNIXTIME' and change it from
0 1447160702320
1 1447160702364
2 1447160722364
Name: UNIXTIME, dtype: int64
into this
0 2015-11-10 14:05:02.320
1 2015-11-10 14:05:02.364
2 2015-11-10 14:05:22.364
Name: UNIXTIME, dtype: datetime64[ns]
However, I would like to use something like pd.apply()
to get the whole dataset returned with the converted column or as I already wrote, simply create datetimes when generating the DataFrame from CSV.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…