r - Adding lagged variables to an lm model?

Question

Welcome To Ask or Share your Answers For Others

r - Adding lagged variables to an lm model?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Adding lagged variables to an lm model?

I'm using lm on a time series, which works quite well actually, and it's super super fast.

Let's say my model is:

> formula <- y ~ x

I train this on a training set:

> train <- data.frame( x = seq(1,3), y = c(2,1,4) )
> model <- lm( formula, train )

... and I can make predictions for new data:

> test <- data.frame( x = seq(4,6) )
> test$y <- predict( model, newdata = test )
> test
  x        y
1 4 4.333333
2 5 5.333333
3 6 6.333333

This works super nicely, and it's really speedy.

I want to add lagged variables to the model. Now, I could do this by augmenting my original training set:

> train$y_1 <- c(0,train$y[1:nrow(train)-1])
> train
  x y y_1
1 1 2   0
2 2 1   2
3 3 4   1

update the formula:

formula <- y ~ x * y_1

... and training will work just fine:

> model <- lm( formula, train )
> # no errors here

However, the problem is that there is no way of using 'predict', because there is no way of populating y_1 in a test set in a batch manner.

Now, for lots of other regression things, there are very convenient ways to express them in the formula, such as poly(x,2) and so on, and these work directly using the unmodified training and test data.

So, I'm wondering if there is some way of expressing lagged variables in the formula, so that predict can be used? Ideally:

formula <- y ~ x * lag(y,-1)
model <- lm( formula, train )
test$y <- predict( model, newdata = test )

... without having to augment (not sure if that's the right word) the training and test datasets, and just being able to use predict directly?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:43:50+0000

Have a look at e.g. the dynlm package which gives you lag operators. More generally the Task Views on Econometrics and Time Series will have lots more for you to look at.

Here is the beginning of its examples -- a one and twelve month lag:

R>      data("UKDriverDeaths", package = "datasets")
R>      uk <- log10(UKDriverDeaths)
R>      dfm <- dynlm(uk ~ L(uk, 1) + L(uk, 12))
R>      dfm

Time series regression with "ts" data:
Start = 1970(1), End = 1984(12)

Call:
dynlm(formula = uk ~ L(uk, 1) + L(uk, 12))

Coefficients:
(Intercept)     L(uk, 1)    L(uk, 12)  
      0.183        0.431        0.511  

R>

Categories

r - Adding lagged variables to an lm model?

r - Adding lagged variables to an lm model?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags