Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
303 views
in Technique[技术] by (71.8m points)

r - Why does lm run out of memory while matrix multiplication works fine for coefficients?

I am trying to do fixed effects linear regression with R. My data looks like

dte   yr   id   v1   v2
  .    .    .    .    .
  .    .    .    .    .
  .    .    .    .    .

I then decided to simply do this by making yr a factor and use lm:

lm(v1 ~ factor(yr) + v2 - 1, data = df)

However, this seems to run out of memory. I have 20 levels in my factor and df is 14 million rows which takes about 2GB to store, I am running this on a machine with 22 GB dedicated to this process.

I then decided to try things the old fashioned way: create dummy variables for each of my years t1 to t20 by doing:

df$t1 <- 1*(df$yr==1)
df$t2 <- 1*(df$yr==2)
df$t3 <- 1*(df$yr==3)
...

and simply compute:

solve(crossprod(x), crossprod(x,y))

This runs without a problem and produces the answer almost right away.

I am specifically curious what is it about lm that makes it run out of memory when I can compute the coefficients just fine? Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In addition to what idris said, it's also worth pointing out that lm() does not solve for the parameters using the normal equations like you illustrated in your question, but rather uses QR decomposition, which is less efficient but tends to produce more numerically accurate solutions.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...