Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.3k views
in Technique[技术] by (71.8m points)

python - ValueError: Classification metrics can't handle a mix of multilabel-indicator and continuous-multioutput targets sklearn

I use the random forest classifier algorithm to predict the belonging of my samples to different classes (5 different classes). However, after having made the prediction I cannot evaluate my model precisely because of the different classes. I saw in another post that it was necessary to use np.argmax(y_pred, axis=1) but I didn't really understand the usefulness and how to use this element nor even if it is required in my case. Can you please help me?

import numpy as np
import pandas as pd
from sklearn import metrics
from keras.utils import to_categorical
import sklearn as sk
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

X = pd.read_csv('/Users/lottie/desktop/1.csv', header=None)
Y = pd.read_csv('/Users/lottie/desktop/2.csv', header=None)

X.drop([0,0], inplace=True)
Y.drop([0,0], inplace=True)
del X[0]
del Y[0]

Y_encoded = list()
for i in Y.loc[0:,1] :
    if i == 'BRCA' : Y_encoded.append(0)
    if i == 'KIRC' : Y_encoded.append(1)
    if i == 'COAD' : Y_encoded.append(2)
    if i == 'LUAD' : Y_encoded.append(3)
    if i == 'PRAD' : Y_encoded.append(4)
Y_bis = to_categorical(Y_encoded)


X_train, X_test, y_train, y_test = train_test_split(X, Y_bis, test_size=0.30, random_state=42)

regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)


print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test, y_pred))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You are using RandomForestRegressor. This model is for continuous variables (such as the price of a house), and your output is not continuous if you have classes.

If you have classes you have to use RandomForestClassifier. Obviously, you have to encode your output as number. One number for each different class. Then, when you predict, you will obtain the number of the class.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...