Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
335 views
in Technique[技术] by (71.8m points)

r - Create new dummy variable columns from categorical variable

I have a several data sets with 75,000 observations and a type variable that can take on a value 0-4. I want to add five new dummy variables to each data set for all types. The best way I could come up with to do this is as follows:

# For the 'binom' data set create dummy variables for all types in all data sets
binom.dummy.list<-list()
for(i in 0:4){
    binom.dummy.list[[i+1]]<-sapply(binom$type,function(t) ifelse(t==i,1,0))
}

# Add and merge data
binom.dummy.df<-as.data.frame(do.call("cbind",binom.dummy.list))
binom.dummy.df<-transform(binom.dummy.df,id=1:nrow(binom))
binom<-merge(binom,binom.dummy.df,by="id")

While this works, it is incredibly slow (the merge function has even crashed a few times). Is there a more efficient way to do this? Perhaps this functionality is part of a package that I am not familiar with?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

R has a "sub-language" to translate formulas into design matrix, and in the spirit of the language you can take advantage of it. It's fast and concise. Example: you have a cardinal predictor x, a categorical predictor catVar, and a response y.

> binom <- data.frame(y=runif(1e5), x=runif(1e5), catVar=as.factor(sample(0:4,1e5,TRUE)))
> head(binom)
          y          x catVar
1 0.5051653 0.34888390      2
2 0.4868774 0.85005067      2
3 0.3324482 0.58467798      2
4 0.2966733 0.05510749      3
5 0.5695851 0.96237936      1
6 0.8358417 0.06367418      2

You just do

> A <- model.matrix(y ~ x + catVar,binom) 
> head(A)
  (Intercept)          x catVar1 catVar2 catVar3 catVar4
1           1 0.34888390       0       1       0       0
2           1 0.85005067       0       1       0       0
3           1 0.58467798       0       1       0       0
4           1 0.05510749       0       0       1       0
5           1 0.96237936       1       0       0       0
6           1 0.06367418       0       1       0       0

Done.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...