python - Neural network (perceptron) - visualizing decision boundary (as a hyperplane) when performing binary classification

Question

Welcome To Ask or Share your Answers For Others

python - Neural network (perceptron) - visualizing decision boundary (as a hyperplane) when performing binary classification

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Neural network (perceptron) - visualizing decision boundary (as a hyperplane) when performing binary classification

I would like to visualize the decision boundary for a simple neural network with only one neuron (3 inputs, binary output). I'm extracting the weights from a Keras NN model and then attempting to draw the surface plane using matplotlib. Unfortunately, the hyperplane is not appearing between the points on the scatter plot, but instead is displaying underneath all the data points (see output image).

I am calculating the z-axis of the hyperplane using the equation z = (d - ax - by) / c for a hyperplane defined as ax + by + cz = d

Could somebody assist me with correctly constructing and displaying a hyperplane based on the NN weights?

The goal here is to classify individuals into two groups (diabetes or no diabetes), based on 3 predictor variables using a public dataset (https://www.kaggle.com/uciml/pima-indians-diabetes-database).

%matplotlib notebook

import pandas as pd
import numpy as np
from keras import models
from keras import layers
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d

EPOCHS = 2

#Data source: https://www.kaggle.com/uciml/pima-indians-diabetes-database
ds = pd.read_csv('diabetes.csv', sep=',', header=0)

#subset and split
X = ds[['BMI', 'DiabetesPedigreeFunction', 'Glucose']]
Y = ds[['Outcome']]

#construct perceptron with 3 inputs and a single output
model = models.Sequential()
layer1 = layers.Dense(1, activation='sigmoid', input_shape=(3,))
model.add(layer1)

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

#train perceptron
history = model.fit(x=X, y=Y, epochs=EPOCHS)

#display accuracy and loss
epochs = range(len(history.epoch))

plt.figure()
plt.plot(epochs, history.history['accuracy'])
plt.xlabel('Epochs')
plt.ylabel('Accuracy')

plt.figure()
plt.plot(epochs, history.history['loss'])
plt.xlabel('Epochs')
plt.ylabel('Loss')

plt.show()

#extract weights and bias from model
weights = model.layers[0].get_weights()[0]
biases = model.layers[0].get_weights()[1]

w1 = weights[0][0] #a
w2 = weights[1][0] #b
w3 = weights[2][0] #c
b = biases[0]      #d

#construct hyperplane: ax + by + cz = d
a,b,c,d = w1,w2,w3,b

x_min = ds.BMI.min()
x_max = ds.BMI.max()

x = np.linspace(x_min, x_max, 100)

y_min = ds.DiabetesPedigreeFunction.min()
y_max = ds.DiabetesPedigreeFunction.max()

y = np.linspace(y_min, y_max, 100)

Xs,Ys = np.meshgrid(x,y)
Zs = (d - a*Xs - b*Ys) / c

#visualize 3d scatterplot with hyperplane
fig = plt.figure(num=None, figsize=(9, 9), dpi=100, facecolor='w', edgecolor='k')
ax = fig.gca(projection='3d')

ax.plot_surface(Xs, Ys, Zs, alpha=0.45)

ax.scatter(ds.BMI, ds.DiabetesPedigreeFunction, ds.Glucose, c=ds.Outcome)

ax.set_xlabel('BMI')
ax.set_ylabel('DiabetesPedigreeFunction')
ax.set_zlabel('Glucose')

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:39:47+0000

Best guess without reading all the code in detail. It looks like you applied a sigmoid activation. If you train with no activation (activation='linear'), you should get the visualization you are looking for. You may have to train longer to get convergence (assuming it can converge without an activation). If you want to keep the sigmoid, then you need to map your linear neuron through this activation (hence it won't look like a plane anymore).

EDIT:

My understanding of NNs. A dense layer from 3 to 1 and a sigmoid activation is the attempt to optimize the variables a,b,c,d in the equation:

f(x,y,z) = 1/(1+e^(-D(x,y,z)); D(x,y,z) = ax+by+cz+d

so that the binary_crossentropy (what you picked) is minimized, I will use B for the sum of the logs. Our loss equation would look something like:

L = ∑ B(y,Y)

where y is the value we want to predict, a 0 or 1 in this case, and Y is the value output by the equation above, the sum adds over all the data (or batches in a NN). Hence, this can be written like

L = ∑ B(y,f(x,y,z))

Finding the minimum of L given variables a,b,c,d can probably be calculated directly by taking partial derivatives and solving the given system of equations (This is why NN should never be used with a small set of variables (like 4), because they can be explicitly solved, so there is no point in training). Regardless of direct solving or using stocastic gradient decent to slowly move the a,b,c,d towards a minimum; in any case we end up with the optimized a,b,c,d.

a,b,c,d have be tuned to specifically produce values that when plugged into the sigmoid equation produce predicted categories that when tested in the Loss equation would give us a minimum loss.

I stand corrected though. In this case, because we have specifically a sigmoid, then setting up and solving the boundary equation, does appear to always produce a plane (did not know that). I don't think this would work with any other activation or with any NN that has more than one layer.

1/2 = 1/(1 + e^(-D(x,y,z))) ... D(x,y,z) = 0 ax+by+cz+d = 0

So, I downloaded your data and ran your code. I don't get convergence at all; I tried various batch_sizes, loss functions, and activation functions. Nothing. Based on the picture, it seems plausible that nearly every randomized weights are going to favor moving away from the cluster than trying to find the center of it.

You probably need to transform your data first (normalize on all the axes might do the trick), or manually set your weights to something in the center, so that the training converges. Long story short, your a,b,c,d are not optimal. You could also explicitly solve the partial derivatives above and find the optimal a,b,c,d instead of trying to get a single neuron to converge. There are also explicit equations for calculating the optimal plane that separates binary data (an extension of linear regression).

Categories

python - Neural network (perceptron) - visualizing decision boundary (as a hyperplane) when performing binary classification

python - Neural network (perceptron) - visualizing decision boundary (as a hyperplane) when performing binary classification

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags