r random forest error - type of predictors in new data do not match

Question

Welcome To Ask or Share your Answers For Others

r random forest error - type of predictors in new data do not match

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

r random forest error - type of predictors in new data do not match

I am trying to use quantile regression forest function in R (quantregForest) which is built on Random Forest package. I am getting a type mismatch error that I can't quite figure why.

I train the model by using

qrf <- quantregForest(x = xtrain, y = ytrain)

which works without a problem, but when I try to test with new data like

quant.newdata <- predict(qrf, newdata= xtest)

it gives the following error:

Error in predict.quantregForest(qrf, newdata = xtest) : 
Type of predictors in new data do not match types of the training data.

My training and testing data are coming from separate files (hence separate data frames) but having the same format. I have checked the classes of the predictors with

sapply(xtrain, class)
sapply(xtest, class)

Here is the output:

> sapply(xtrain, class)
pred1     pred2     pred3     pred4     pred5     pred6     pred7     pred8 
"factor" "integer" "integer" "integer"  "factor"  "factor" "integer"  "factor" 
pred9    pred10    pred11    pred12 
"factor"  "factor"  "factor"  "factor" 


> sapply(xtest, class)
pred1     pred2     pred3     pred4     pred5     pred6     pred7     pred8 
"factor" "integer" "integer" "integer"  "factor"  "factor" "integer"  "factor" 
pred9    pred10    pred11    pred12 
"factor"  "factor"  "factor"  "factor"

They are exactly the same. I also checked for the "NA" values. Neither xtrain nor xtest has a NA value in it. Am I missing something trivial here?

Update I: running the prediction on the training data still gives the same error

> quant.newdata <- predict(qrf, newdata = xtrain)
Error in predict.quantregForest(qrf, newdata = xtrain) : 
names of predictor variables do not match

Update II: I combined my training and test sets so that rows from 1 to 101 are the training data and the rest is the testing. I modified the example provided in (quantregForest) as:

data <-  read.table("toy.txt", header = T)
n <- nrow(data)
indextrain <- 1:101
xtrain <- data[indextrain, 3:14]
xtest <- data[-indextrain, 3:14]
ytrain <- data[indextrain, 15]
ytest <- data[-indextrain, 15]

qrf <- quantregForest(x=xtrain, y=ytrain)
quant.newdata <- predict(qrf, newdata= xtest)

And it works! I'd appreciate if any one could explain why it works this way and not with the other way?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:45:53+0000

I had the same problem. You can try to use small trick to equalize classes of training and test set. Bind the first row of training set to the test set and than delete it. For your example it should look like this:

    xtest <- rbind(xtrain[1, ] , xtest)
    xtest <- xtest[-1,]

Categories

r random forest error - type of predictors in new data do not match

r random forest error - type of predictors in new data do not match

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags