Here is an outline of doing rolling OLS with statsmodels and should work for your data. simply use df=pd.read_csv('estimated_pred.csv')
instead of my randomly generated df:
import pandas as pd
import numpy as np
import statsmodels.api as sm
#random data
df=df.dropna() #uncomment this line to drop nans
window = 5
df['a']=None #constant
df['b1']=None #beta1
df['b2']=None #beta2
for i in range(window,len(df)):
#The following line gives you predicted values in a row, given the PRIOR row's estimated parameters
I store the constant and betas, but there are a number of ways to approach predicting... you can use your fitted model object mine is RollOLS
and the .predict()
method, or multiply it yourself which I did in the final line (easier to do this way in this case because number of variables is fixed and known and you can do simple column math all in one go).
to do predictions with sm though as you go it would look like this:
but keep in mind, if you ran the above code in sequence the predicted values would be using the model of the last window only. if you want to use a different model then you can save those as you go, or predict values within the for loop. Note you can also get fitted values with RollOLS.fittedvalues
, and so if you are smoothing data pull and save RollOLS.fittedvalues[-1]
for each iteration in the loop.
To help see how to use for your own data here is the tail of my df after the rolling regression loop is run:
time X Y a b1 b2
495 0.662463 0.771971 0.643008 -0.0235751 0.037875 0.0907694
496 -0.127879 1.293141 0.404959 0.00314073 0.0441054 0.113387
497 -0.006581 -0.824247 0.226653 0.0105847 0.0439867 0.118228
498 1.870858 0.920964 0.571535 0.0123463 0.0428359 0.11598
499 0.724296 0.537296 -0.411965 0.00104044 0.055003 0.118953