Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
135 views
in Technique[技术] by (71.8m points)

plm - How to correctly take out zero observations in panel data in R

I'm running into some problems while running plm regressions in my panel database. Basically, I have to take out a year from my base and also all observations from some variable that are zero. I tried to make a reproducible example using a dataset from AER package.


require (AER)
library (AER)
require(plm)
library("plm")

data("Grunfeld", package = "AER")
View(Grunfeld)
#Here I randomize some observations of the third variable (capital) as zero, to reproduce my dataset
for (i in 1:220) {
  x <- rnorm(10,0,1)
  if (mean(x) >=0) {
    Grunfeld[i,3] <- 0
  }
}
View(Grunfeld)


panel <- Grunfeld

#First Method
#This is how I was originally manipulating my data and running my regression 

panel <- Grunfeld

dd <-pdata.frame(panel, index = c('firm', 'year'))

dd <- dd[dd$year!=1935, ]

dd <- dd[dd$capital !=0, ]

ols_model_2 <- plm(log(value) ~ (capital), data=dd)
summary(ols_model_2)
#However, I couuldn't plot the variables of the datasets in graphs, because they weren't vectors. So I tried another way:

#Second Method

panel <- panel[panel$year!= 1935, ]

panel <- panel[panel$capital != 0,]

ols_model <- plm(log(value) ~ log(capital), data=panel, index = c('firm','year'))
summary(ols_model)

#But this gave extremely different results for the ols regression!

In my understanding, both approaches sould have yielded the same outputs in the OLS regression. Now I'm afraid my entire analysis is wrong, because I was doing it like the first way. Could anyone explain me what is happening? Thanks in advance!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You are a running two different models. I am not sure why you would expect results to be the same.

Your first model is:

ols_model_2 <- plm(log(value) ~ (capital), data=dd)

While the second is:

ols_model <- plm(log(value) ~ log(capital), data=panel, index = c('firm','year'))

As you see from the summary of the models, both are "Oneway (individual) effect Within Model". In the first one you dont specify the index, since dd is a pdata.frame object. In the second you do specify the index, because panel is a simple data.frame. However this makes no difference at all.

The difference is using the log of capital or capital without log.

As a side note, leaving out 0 observations is often very problematic. If you do that, make sure you also try alternative ways of dealing with zero, and see how much your results change. You can get started here https://stats.stackexchange.com/questions/1444/how-should-i-transform-non-negative-data-including-zeros


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...