r - Predicting Probabilities for GBM with caret library

Question

Welcome To Ask or Share your Answers For Others

r - Predicting Probabilities for GBM with caret library

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Predicting Probabilities for GBM with caret library

A similar question was asked however the link in the answer points to random forest example, it doesn't seem to work in my case.

Here is an example what I'm trying to do:

gbmGrid <-  expand.grid(interaction.depth = c(5, 9),
                    n.trees = (1:3)*200,
                    shrinkage = c(0.05, 0.1))

fitControl <- trainControl(
                       method = "cv",
                       number = 3,
                       classProbs = TRUE)

gbmFit <- train(strong~.-Id-PlayerName, data = train[1:10000,],
             method = "gbm",
             trControl = fitControl,
             verbose = TRUE,
             tuneGrid = gbmGrid)
gbmFit

Everything goes fine, I get the best parameters. Now if I do the prediction:

predictStrong = predict(gbmFit, newdata=train[11000:50000,])

I get a binary vector of predictions, which is good:

[1] 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 ...

However when I try to get probabilities, I get an error:

predictStrong = predict(gbmFit, newdata=train[11000:50000,], type="prob")

Error in `[.data.frame`(out, , obsLevels, drop = FALSE) : 
undefined columns selected

Where seems to be the problem?

Additional info:

traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(out, , obsLevels, drop = FALSE)
3: out[, obsLevels, drop = FALSE]
2: predict.train(gbmFit, newdata = train[11000:50000, ], type = "prob")
1: predict(gbmFit, newdata = train[11000:50000, ], type = "prob")

Versions:

R version 3.1.0 (2014-04-10) -- "Spring Dance"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

caret version: 6.0-29

EDIT: I've seen this topic as well and I don't get an error about variable names, although I have couple of variable names with underscores, which I assume it's valid, as I use make.names and get the same names as the original.

colnames(train) == make.names(colnames(train))
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:56:02+0000

When class probabilities are requested, train puts them into a data frame with a column for each class. If the factor levels are not valid variable names, they are automatically changed (e.g. "0" becomes "X0"). train issues a warning in this case that goes something like "At least one of the class levels are not valid R variables names. This may cause errors if class probabilities are generated."

Categories

r - Predicting Probabilities for GBM with caret library

r - Predicting Probabilities for GBM with caret library

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags