I use the random forest classifier algorithm to predict the belonging of my samples to different classes (5 different classes). However, after having made the prediction I cannot evaluate my model precisely because of the different classes. I saw in another post that it was necessary to use np.argmax(y_pred, axis=1) but I didn't really understand the usefulness and how to use this element nor even if it is required in my case. Can you please help me?
import numpy as np
import pandas as pd
from sklearn import metrics
from keras.utils import to_categorical
import sklearn as sk
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
X = pd.read_csv('/Users/lottie/desktop/1.csv', header=None)
Y = pd.read_csv('/Users/lottie/desktop/2.csv', header=None)
X.drop([0,0], inplace=True)
Y.drop([0,0], inplace=True)
del X[0]
del Y[0]
Y_encoded = list()
for i in Y.loc[0:,1] :
if i == 'BRCA' : Y_encoded.append(0)
if i == 'KIRC' : Y_encoded.append(1)
if i == 'COAD' : Y_encoded.append(2)
if i == 'LUAD' : Y_encoded.append(3)
if i == 'PRAD' : Y_encoded.append(4)
Y_bis = to_categorical(Y_encoded)
X_train, X_test, y_train, y_test = train_test_split(X, Y_bis, test_size=0.30, random_state=42)
regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test, y_pred))
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…