Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
257 views
in Technique[技术] by (71.8m points)

python - Make Multiple Shifted (Lagged) Columns in Pandas

I have a time-series DataFrame and I want to replicate each of my 200 features/columns as additional lagged features. So at the moment I have features at time t and want to create features at timestep t-1, t-2 and so on.

I know this is best done with df.shift() but I'm having trouble putting it altogether. I want to also rename the columns to 'feature (t-1)', 'feature (t-2)'.

My pseudo-code attempt would be something like:

lagged_values = [1,2,3,10]
for every lagged_values
    for every column, make a new feature column with df.shift(lagged_values)
    make new column have name 'original col name'+'(t-(lagged_values))'

In the end if I have 200 columns and 4 lagged timesteps I would have a new df with 1,000 features (200 each at t, t-1, t-2, t-3 and t-10).

I have found something similar but it doesn't keep the original column names (renames to var1, var2, etc) as per machine learning mastery. Unfortunately I don't understand it well enough to modify it to my problem.

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    """
    Frame a time series as a supervised learning dataset.
    Arguments:
        data: Sequence of observations as a list or NumPy array.
        n_in: Number of lag observations as input (X).
        n_out: Number of observations as output (y).
        dropnan: Boolean whether or not to drop rows with NaN values.
    Returns:
        Pandas DataFrame of series framed for supervised learning.
    """
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
    # put it all together
    agg = concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can create the additional columns using a dictionary comprehension and then add them to your dataframe via assign.

df = pd.DataFrame(np.random.randn(5, 2), columns=list('AB'))

lags = range(1, 3)  # Just two lags for demonstration.

>>> df.assign(**{
    f'{col} (t-{lag})': df[col].shift(lag)
    for lag in lags
    for col in df
})
          A         B   A (t-1)   A (t-2)   B (t-1)   B (t-2)
0 -0.773571  1.945746       NaN       NaN       NaN       NaN
1  1.375648  0.058043 -0.773571       NaN  1.945746       NaN
2  0.727642  1.802386  1.375648 -0.773571  0.058043  1.945746
3 -2.427135 -0.780636  0.727642  1.375648  1.802386  0.058043
4  1.542809 -0.620816 -2.427135  0.727642 -0.780636  1.802386

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...