Extrapolating a DataFrame
with a DatetimeIndex
index
This can be done with two steps:
- Extend the
DatetimeIndex
- Extrapolate the data
Extend the Index
Overwrite df
with a new DataFrame
where the data is resampled onto a new extended index based on original index's start, period and frequency. This allows the original df
to come from anywhere, as in the csv
example case. With this the columns get conveniently filled with NaNs!
# Fake DataFrame for example (could come from anywhere)
X1 = range(10)
X2 = map(lambda x: x**2, X1)
df = pd.DataFrame({'x1': X1, 'x2': X2}, index=pd.date_range('20130101',periods=10,freq='M'))
# Number of months to extend
extend = 5
# Extrapolate the index first based on original index
df = pd.DataFrame(
data=df,
index=pd.date_range(
start=df.index[0],
periods=len(df.index) + extend,
freq=df.index.freq
)
)
# Display
print df
x1 x2
2013-01-31 0 0
2013-02-28 1 1
2013-03-31 2 4
2013-04-30 3 9
2013-05-31 4 16
2013-06-30 5 25
2013-07-31 6 36
2013-08-31 7 49
2013-09-30 8 64
2013-10-31 9 81
2013-11-30 NaN NaN
2013-12-31 NaN NaN
2014-01-31 NaN NaN
2014-02-28 NaN NaN
2014-03-31 NaN NaN
Extrapolate the data
Most extrapolators will require the inputs to be numeric instead of dates. This can be done with
# Temporarily remove dates and make index numeric
di = df.index
df = df.reset_index().drop('index', 1)
See this answer for how to extrapolate the values of each column of a DataFrame
with a 3rd order polynomial.
Snippet from answer
# Curve fit each column
for col in fit_df.columns:
# Get x & y
x = fit_df.index.astype(float).values
y = fit_df[col].values
# Curve fit column and get curve parameters
params = curve_fit(func, x, y, guess)
# Store optimized parameters
col_params[col] = params[0]
# Extrapolate each column
for col in df.columns:
# Get the index values for NaNs in the column
x = df[pd.isnull(df[col])].index.astype(float).values
# Extrapolate those points with the fitted function
df[col][x] = func(x, *col_params[col])
Once the columns are extrapolated, put the dates back
# Put date index back
df.index = di
# Display
print df
x1 x2
2013-01-31 0 0
2013-02-28 1 1
2013-03-31 2 4
2013-04-30 3 9
2013-05-31 4 16
2013-06-30 5 25
2013-07-31 6 36
2013-08-31 7 49
2013-09-30 8 64
2013-10-31 9 81
2013-11-30 10 100
2013-12-31 11 121
2014-01-31 12 144
2014-02-28 13 169
2014-03-31 14 196
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…