I am working on a fraud analytics project and I need some help with boosting. Previously, I used SAS Enterprise Miner to learn more about boosting/ensemble techniques and I learned that boosting can help to improve the performance of a model.
Currently, my group have completed the following models on Python: Naive Bayes, Random Forest, and Neural Network We want to use XGBoost to make the F1-score better. I am not sure if this is possible since I only come across tutorials on how to do XGBoost or Naive Bayes on its own.
I am looking for a tutorial where they will show you how to create a Naive Bayes model and then use boosting. After that, we can compare the metrics with and without boosting to see if it improved. I am relatively new to machine learning so I could be wrong about this concept.
I thought of replacing the values in the XGBoost but not sure which one to change or if it can even work this way.
Naive Bayes
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_sm,y_sm, test_size = 0.2, random_state=0)
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, confusion_matrix, accuracy_score, f1_score, precision_score, recall_score
nb = GaussianNB()
nb.fit(X_train, y_train)
nb_pred = nb.predict(X_test)
XGBoost
from sklearn.model_selection import train_test_split
import xgboost as xgb
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_sm,y_sm, test_size = 0.2, random_state=0)
model = XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.9, gamma=0,
learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=500, n_jobs=-1,
nthread=None, objective='binary:logistic', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=0.9, verbosity=0)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
See Question&Answers more detail:
os