I have a dataframe with multiple columns along with a date column. The date format is 12/31/15 and I have set it as a datetime object.
I set the datetime column as the index and want to perform a regression calculation for each month of the dataframe.
I believe the methodology to do this would be to split the dataframe into multiple dataframes based on month, store into a list of dataframes, then perform regression on each dataframe in the list.
I have used groupby which successfully split the dataframe by month, but am unsure how to correctly convert each group in the groupby object into a dataframe to be able to run my regression function on it.
Does anyone know how to split a dataframe into multiple dataframes based on date, or a better approach to my problem?
Here is my code I've written so far
import pandas as pd
import numpy as np
import statsmodels.api as sm
from patsy import dmatrices
df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
df = df.set_index('date')
# Group dataframe on index by month and year
# Groupby works, but dmatrices does not
for df_group in df.groupby(pd.TimeGrouper("M")):
y,X = dmatrices('value1 ~ value2 + value3', data=df_group,
return_type='dataframe')
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…