I'm trying to do a simple linear regression on a pandas data frame using scikit learn linear regressor. My data is a time series, and the pandas data frame has a datetime index:
value
2007-01-01 0.771305
2007-02-01 0.256628
2008-01-01 0.670920
2008-02-01 0.098047
Doing something simple as
from sklearn import linear_model
lr = linear_model.LinearRegression()
lr(data.index, data['value'])
didn't work:
float() argument must be a string or a number
So I tried to create a new column with the dates to try to transform it:
data['date'] = data.index
data['date'] = pd.to_datetime(data['date'])
lr(data['date'], data['value'])
but now I get:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
So the regressor can't handle datetime. I saw a bunch of ways to convert integer data to datetime, but couldn't find a way to convert from datetime to integer, for example.
What is the proper way to do this?
PS: I'm interested in using scikit because I'm planning on doing more stuff with it later, so no statsmodels for now.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…