Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
538 views
in Technique[技术] by (71.8m points)

r - Adding lagged variables to an lm model?

I'm using lm on a time series, which works quite well actually, and it's super super fast.

Let's say my model is:

> formula <- y ~ x

I train this on a training set:

> train <- data.frame( x = seq(1,3), y = c(2,1,4) )
> model <- lm( formula, train )

... and I can make predictions for new data:

> test <- data.frame( x = seq(4,6) )
> test$y <- predict( model, newdata = test )
> test
  x        y
1 4 4.333333
2 5 5.333333
3 6 6.333333

This works super nicely, and it's really speedy.

I want to add lagged variables to the model. Now, I could do this by augmenting my original training set:

> train$y_1 <- c(0,train$y[1:nrow(train)-1])
> train
  x y y_1
1 1 2   0
2 2 1   2
3 3 4   1

update the formula:

formula <- y ~ x * y_1

... and training will work just fine:

> model <- lm( formula, train )
> # no errors here

However, the problem is that there is no way of using 'predict', because there is no way of populating y_1 in a test set in a batch manner.

Now, for lots of other regression things, there are very convenient ways to express them in the formula, such as poly(x,2) and so on, and these work directly using the unmodified training and test data.

So, I'm wondering if there is some way of expressing lagged variables in the formula, so that predict can be used? Ideally:

formula <- y ~ x * lag(y,-1)
model <- lm( formula, train )
test$y <- predict( model, newdata = test )

... without having to augment (not sure if that's the right word) the training and test datasets, and just being able to use predict directly?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Have a look at e.g. the dynlm package which gives you lag operators. More generally the Task Views on Econometrics and Time Series will have lots more for you to look at.

Here is the beginning of its examples -- a one and twelve month lag:

R>      data("UKDriverDeaths", package = "datasets")
R>      uk <- log10(UKDriverDeaths)
R>      dfm <- dynlm(uk ~ L(uk, 1) + L(uk, 12))
R>      dfm

Time series regression with "ts" data:
Start = 1970(1), End = 1984(12)

Call:
dynlm(formula = uk ~ L(uk, 1) + L(uk, 12))

Coefficients:
(Intercept)     L(uk, 1)    L(uk, 12)  
      0.183        0.431        0.511  

R> 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...