python - datetime dtypes in pandas read_csv

Question

Welcome To Ask or Share your Answers For Others

python - datetime dtypes in pandas read_csv

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - datetime dtypes in pandas read_csv

I'm reading in a csv file with multiple datetime columns. I'd need to set the data types upon reading in the file, but datetimes appear to be a problem. For instance:

headers = ['col1', 'col2', 'col3', 'col4']
dtypes = ['datetime', 'datetime', 'str', 'float']
pd.read_csv(file, sep='', header=None, names=headers, dtype=dtypes)

When run gives a error:

TypeError: data type "datetime" not understood

Converting columns after the fact, via pandas.to_datetime() isn't an option I can't know which columns will be datetime objects. That information can change and comes from whatever informs my dtypes list.

Alternatively, I've tried to load the csv file with numpy.genfromtxt, set the dtypes in that function, and then convert to a pandas.dataframe but it garbles the data. Any help is greatly appreciated!

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T22:10:12+0000

Why it does not work

There is no datetime dtype to be set for read_csv as csv files can only contain strings, integers and floats.

Setting a dtype to datetime will make pandas interpret the datetime as an object, meaning you will end up with a string.

Pandas way of solving this

The pandas.read_csv() function has a keyword argument called parse_dates

Using this you can on the fly convert strings, floats or integers into datetimes using the default date_parser (dateutil.parser.parser)

headers = ['col1', 'col2', 'col3', 'col4']
dtypes = {'col1': 'str', 'col2': 'str', 'col3': 'str', 'col4': 'float'}
parse_dates = ['col1', 'col2']
pd.read_csv(file, sep='', header=None, names=headers, dtype=dtypes, parse_dates=parse_dates)

This will cause pandas to read col1 and col2 as strings, which they most likely are ("2016-05-05" etc.) and after having read the string, the date_parser for each column will act upon that string and give back whatever that function returns.

Defining your own date parsing function:

The pandas.read_csv() function also has a keyword argument called date_parser

Setting this to a lambda function will make that particular function be used for the parsing of the dates.

GOTCHA WARNING

You have to give it the function, not the execution of the function, thus this is Correct

date_parser = pd.datetools.to_datetime

This is incorrect:

date_parser = pd.datetools.to_datetime()

Pandas 0.22 Update

pd.datetools.to_datetime has been relocated to date_parser = pd.to_datetime

Thanks @stackoverYC

Categories

python - datetime dtypes in pandas read_csv

python - datetime dtypes in pandas read_csv

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Why it does not work

Pandas way of solving this

Defining your own date parsing function:

GOTCHA WARNING

Pandas 0.22 Update

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags