Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
238 views
in Technique[技术] by (71.8m points)

keras - GridsearchCV loss doesn't equal model.fit() loss values

I am confused as to which metric GridsearchCV is using in its parameter search. My understanding is that my model object feeds it a metric and this is what is used to determine the "best_params". But this doesn't appear to be the case. I thought that score=None is the default and as a result the first metric given in the metrics option of model.compile() was used. So in my case the the scoring function used should be the mean_squred_error. My explanation for this issue is described next.

Here is what I am doing. I simulated some regression data using sklearn with 10 features on 100,000 observations. I am playing around with keras because I typically used pytorch in the past and never really dabbled with keras until now. I am noticing a discrepancy in the loss function output from my GridsearchCV call vs the model.fit() call after I have my optimal set of parameters. Now I know I can just refit=True and not re-fit the model again, but I am trying to get a feel for the output of the keras and sklearn GridsearchCV functions.

To be explicit about the discrepancy here is what I am seeing. I simulated some data using sklearn as follows:

# Setting some data basics
N = 10000
feats = 10

# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)

# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]

I have created a "create_model" function that is looking to tune which activation function I am using (again this is a simple example for a proof of concept).

def create_model(activation_fn):
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=feats, activation=activation_fn,
                 kernel_initializer='normal'))
    model.add(Dropout(0.2))
    model.add(Dense(10, activation=activation_fn))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='linear'))
    # Compile model
    model.compile(loss='mean_squared_error',
                  optimizer='adam',
                  metrics=['mean_squared_error','mae'])
    return model

Performing the grid search I get the following output

model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
Best: -21.163454 using {'activation_fn': 'linear'}

Ok, so the best metric is the mean squared error of 21.16 (I understand they flip the sign to create a maximization problem). So, when I fit the model using the activation_fn = 'linear' the MSE I get is totally different.

best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1)
.....
.....
Epoch 49/50
8000/8000 [==============================] - 0s 48us/step - loss: 344.1636 - mean_squared_error: 344.1636 - mean_absolute_error: 12.2109
Epoch 50/50
8000/8000 [==============================] - 0s 48us/step - loss: 326.4524 - mean_squared_error: 326.4524 - mean_absolute_error: 11.9250
history.history['mean_squared_error']
Out[723]: 
[10053.778002929688,
 9826.66806640625,
  ......
  ......
 344.16363830566405,
 326.45237121582034]

The difference is in 326.45 vs. 21.16. Any insight as to what I am misunderstanding would be greatly appreciated. I would be more comfortable if they were within a reasonable neighborhood of each other, given one is the error from one fold vs the entire training data set. But 21 is nowhere near 326. Thanks!

The entire code is seen here.

import pandas as pd
import numpy as np
from keras import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
from keras.constraints import maxnorm
from sklearn import preprocessing 
from sklearn.preprocessing import scale
from sklearn.datasets import make_regression
from matplotlib import pyplot as plt

# Setting some data basics
N = 10000
feats = 10

# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)

# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]

def create_model(activation_fn):
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=feats, activation=activation_fn,
                 kernel_initializer='normal'))
    model.add(Dropout(0.2))
    model.add(Dense(10, activation=activation_fn))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='linear'))
    # Compile model
    model.compile(loss='mean_squared_error',
                  optimizer='adam',
                  metrics=['mean_squared_error','mae'])
    return model

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

# create model
model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)

# define the grid search parameters
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1)

best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1)

history.history.keys()
plt.plot(history.history['mean_absolute_error'])

# summarize results
grid_result.cv_results_
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The large loss reported in your output (326.45237121582034) is the training loss. If you need a metric to be compared with the grid_result.best_score_ (in the GridSearchCV) and the MSE (in the best_model.fit), you have to request the validation loss (cf. code below).

Now to the question: why is the validation loss lower than the training loss? In your case it is essentially because of dropout (which is applied during training but not during validation/test) - that is why the difference between training and validation losses disappears when you remove dropout. You can find a detailed explanation here of the possible reasons for a lower validation loss.

In short, the performance (MSE) of your model is given by the grid_result.best_score_ (21.163454 in your example).

import numpy as np
from keras import Sequential
from keras.layers import Dense, Dropout
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.datasets import make_regression
import tensorflow as tf

# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
tf.random.set_seed(42)

# Setting some data basics
N = 10000
feats = 10

# generate regression dataset
X, y = make_regression(n_samples=N, n_features=feats, n_informative=2, noise=3)

# training data and testing data #
X_train = X[:int(N * 0.8)]
y_train = y[:int(N * 0.8)]
X_test = X[int(N * 0.8):]
y_test = y[int(N * 0.8):]

def create_model(activation_fn):
    # create model
    model = Sequential()
    model.add(Dense(30, input_dim=feats, activation=activation_fn,
                 kernel_initializer='normal'))
    model.add(Dropout(0.2))
    model.add(Dense(10, activation=activation_fn))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='linear'))
    # Compile model
    model.compile(loss='mean_squared_error',
                  optimizer='adam',
                  metrics=['mean_squared_error','mae'])
    return model

# create model
model = KerasRegressor(build_fn=create_model, epochs=50, batch_size=200, verbose=0)

# define the grid search parameters
activations = ['linear','relu']
param_grid = dict(activation_fn = activations)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1, cv=3)
grid_result = grid.fit(X_train, y_train, verbose=1, validation_data=(X_test, y_test))

best_model = create_model('linear')
history = best_model.fit(X_train, y_train, epochs=50, batch_size=200, verbose=1, validation_data=(X_test, y_test))

history.history.keys()
# plt.plot(history.history['mae'])

# summarize results
print(grid_result.cv_results_)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...