r - Search for corresponding node in a regression tree using rpart

Question

Welcome To Ask or Share your Answers For Others

r - Search for corresponding node in a regression tree using rpart

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Search for corresponding node in a regression tree using rpart

I'm pretty new to R and I'm stuck with a pretty dumb problem.

I'm calibrating a regression tree using the rpart package in order to do some classification and some forecasting.

Thanks to R the calibration part is easy to do and easy to control.

#the package rpart is needed
library(rpart)

# Loading of a big data file used for calibration
my_data <- read.csv("my_file.csv", sep=",", header=TRUE)

# Regression tree calibration
tree <- rpart(Ratio ~ Attribute1 + Attribute2 + Attribute3 + 
                      Attribute4 + Attribute5, 
                      method="anova", data=my_data, 
                      control=rpart.control(minsplit=100, cp=0.0001))

After having calibrated a big decision tree, I want, for a given data sample to find the corresponding cluster of some new data (and thus the forecasted value).
The predict function seems to be perfect for the need.

# read validation data
validationData <-read.csv("my_sample.csv", sep=",", header=TRUE)

# search for the probability in the tree
predict <- predict(tree, newdata=validationData, class="prob")

# dump them in a file
write.table(predict, file="dump.txt")

However with the predict method I just get the forecasted ratio of my new elements, and I can't find a way get the decision tree leaf where my new elements belong.

I think it should be pretty easy to get since the predict method must have found that leaf in order to return the ratio.

There are several parameters that can be given to the predict method through the class= argument, but for a regression tree all seem to return the same thing (the value of the target attribute of the decision tree)

Does anyone know how to get the corresponding node in the decision tree?

By analyzing the node with the path.rpart method, it would help me understanding the results.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:38:21+0000

Benjamin's answer unfortunately doesn't work: type="vector" still returns the predicted values.

My solution is pretty klugy, but I don't think there's a better way. The trick is to replace the predicted y values in the model frame with the corresponding node numbers.

tree2 = tree
tree2$frame$yval = as.numeric(rownames(tree2$frame))
predict = predict(tree2, newdata=validationData)

Now the output of predict will be node numbers as opposed to predicted y values.

(One note: the above worked in my case where tree was a regression tree, not a classification tree. In the case of a classification tree, you probably need to omit as.numeric or replace it with as.factor.)

Categories

r - Search for corresponding node in a regression tree using rpart

r - Search for corresponding node in a regression tree using rpart

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags