r - Looping over combinations of regression model terms

Question

Welcome To Ask or Share your Answers For Others

r - Looping over combinations of regression model terms

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Looping over combinations of regression model terms

I'm running a regression in the form

reg=lm(y ~ x1+x2+x3+z1,data=mydata)

In the place of the last term, z1, I want to loop through a set of different variables, z1 through z10, running a regression for each with it as the last term. E.g. in second run I want to use

reg=lm(y ~ x1+x2+x3+z2,data=mydata)

in 3rd run:

reg=lm(y ~ x1+x2+x3+z3,data=mydata)

How can I automate this by looping through the list of z-variables?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:23:32+0000

While what Sam has provided works and is a good solution, I would personally prefer to go about it slightly differently. His answer has already been accepted, so I'm just posting this for the sake of completeness.

dat1 <- data.frame(y = rpois(100, 5),
                   x1 = runif(100),
                   x2 = runif(100),
                   x3 = runif(100),
                   z1 = runif(100),
                   z2 = runif(100))

lapply(colnames(dat1)[5:6],
       function(x, d) lm(as.formula(paste("y ~ x1 + x2 + x3", x, sep = " + ")), data = d),
       d = dat1)

Rather than looping over the actual columns of the data frame, this loops only over the string of names. This provides some speed improvements as fewer things are copied between iterations.

library(microbenchmark)

microbenchmark({ lapply(<what I wrote above>) })
# Unit: milliseconds
# expr
# {lapply(colnames(dat1)[5:6], function(x, d) lm(as.formula(paste("y ~ x1 + x2 + x3", x, sep = "+")), data = d), d = dat1)}
#       min       lq     mean   median       uq      max neval
#  4.014237 4.148117 4.323387 4.220189 4.281995 5.898811   100

microbenchmark({ lapply(<other answer>) })
# Unit: milliseconds
# expr
# {lapply(dat1[, 5:6], function(x) lm(dat1$y ~ dat1$x1 + dat1$x2 + dat1$x3 + x))}
#       min       lq     mean   median       uq    max neval
#  4.391494 4.505056 5.186972 4.598301 4.698818 51.573   100

The difference is fairly small for this toy example, but as the number of observations and predictors increases, the difference will likely become more pronounced.

Categories

r - Looping over combinations of regression model terms

r - Looping over combinations of regression model terms

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags