Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
550 views
in Technique[技术] by (71.8m points)

r - Predicting Probabilities for GBM with caret library

A similar question was asked however the link in the answer points to random forest example, it doesn't seem to work in my case.

Here is an example what I'm trying to do:

gbmGrid <-  expand.grid(interaction.depth = c(5, 9),
                    n.trees = (1:3)*200,
                    shrinkage = c(0.05, 0.1))

fitControl <- trainControl(
                       method = "cv",
                       number = 3,
                       classProbs = TRUE)

gbmFit <- train(strong~.-Id-PlayerName, data = train[1:10000,],
             method = "gbm",
             trControl = fitControl,
             verbose = TRUE,
             tuneGrid = gbmGrid)
gbmFit

Everything goes fine, I get the best parameters. Now if I do the prediction:

predictStrong = predict(gbmFit, newdata=train[11000:50000,])

I get a binary vector of predictions, which is good:

[1] 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 ...

However when I try to get probabilities, I get an error:

predictStrong = predict(gbmFit, newdata=train[11000:50000,], type="prob")

Error in `[.data.frame`(out, , obsLevels, drop = FALSE) : 
undefined columns selected

Where seems to be the problem?

Additional info:

traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(out, , obsLevels, drop = FALSE)
3: out[, obsLevels, drop = FALSE]
2: predict.train(gbmFit, newdata = train[11000:50000, ], type = "prob")
1: predict(gbmFit, newdata = train[11000:50000, ], type = "prob")

Versions:

R version 3.1.0 (2014-04-10) -- "Spring Dance"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

caret version: 6.0-29

EDIT: I've seen this topic as well and I don't get an error about variable names, although I have couple of variable names with underscores, which I assume it's valid, as I use make.names and get the same names as the original.

colnames(train) == make.names(colnames(train))
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

When class probabilities are requested, train puts them into a data frame with a column for each class. If the factor levels are not valid variable names, they are automatically changed (e.g. "0" becomes "X0"). train issues a warning in this case that goes something like "At least one of the class levels are not valid R variables names. This may cause errors if class probabilities are generated."


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...