Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
449 views
in Technique[技术] by (71.8m points)

r - How to preProcess features when some of them are factors?

My question is related to this one regarding categorical data (factors in R terms) when using the Caret package. I understand from the linked post that if you use the "formula interface", some features can be factors and the training will work fine. My question is how can I scale the data with the preProcess() function? If I try and do it on a data frame with some columns as factors, I get this error message:

Error in preProcess.default(etitanic, method = c("center", "scale")) : 
  all columns of x must be numeric

See here some sample code:

library(earth)
data(etitanic)

a <- preProcess(etitanic, method=c("center", "scale"))
b <- predict(etitanic, a)

Thank you.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It is really the same issue as the post you link to. preProcess works only on numeric data and you have:

> str(etitanic)
'data.frame':   1046 obs. of  6 variables:
 $ pclass  : Factor w/ 3 levels "1st","2nd","3rd": 1 1 1 1 1 1 1 1 1 1 ...
 $ survived: int  1 1 0 0 0 1 1 0 1 0 ...
 $ sex     : Factor w/ 2 levels "female","male": 1 2 1 2 1 2 1 2 1 2 ...
 $ age     : num  29 0.917 2 30 25 ...
 $ sibsp   : int  0 1 1 1 1 0 1 0 2 0 ...
 $ parch   : int  0 2 2 2 2 0 0 0 0 0 ...

You can't center and scale pclass or sex as-is so they need to be converted to dummy variables. You can use model.matrix or caret's dummyVars to do this:

 > new <- model.matrix(survived ~ . - 1, data = etitanic)
 > colnames(new)
 [1] "pclass1st" "pclass2nd" "pclass3rd" "sexmale"   "age"      
 [6] "sibsp"     "parch"  

The -1 gets rid of the intercept. Now you can run preProcess on this object.

btw making preProcess ignore non-numeric data is on my "to do" list but it might cause errors for people not paying attention.

Max


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...