We can do that easily in tf. keras
using its awesome Functional API. Here we will walk you through how to build multi-out with a different type (classification
and regression
) using Functional API.
According to your last diagram, you need one input model and three outputs of different types. To demonstrate, we will use MNIST
which is a handwritten dataset. It's normally a 10 class classification problem data set. From it, we will create additionally 2 class classifier (whether a digit is even
or odd
) and also 1 regression part (which is to predict the square of a digit, i.e for image input of 9, it should give approximately it's square).
Data Set
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
(xtrain, ytrain), (_, _) = keras.datasets.mnist.load_data()
# 10 class classifier
y_out_a = keras.utils.to_categorical(ytrain, num_classes=10)
# 2 class classifier, even or odd
y_out_b = keras.utils.to_categorical((ytrain % 2 == 0).astype(int), num_classes=2)
# regression, predict square of an input digit image
y_out_c = tf.square(tf.cast(ytrain, tf.float32))
So, our training pairs will be xtrain
and [y_out_a, y_out_b, y_out_c]
, same as your last diagram.
Model Building
Let's build the model accordingly using the Functional API of tf. keras
. See the model definition below. The MNIST
samples are a 28 x 28
grayscale image. So our input is set in that way. I'm guessing your data set is probably RGB, so change the input dimension accordingly.
input = keras.Input(shape=(28, 28, 1), name="original_img")
x = layers.Conv2D(16, 3, activation="relu")(input)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation="relu")(x)
x = layers.Conv2D(16, 3, activation="relu")(x)
x = layers.GlobalMaxPooling2D()(x)
out_a = keras.layers.Dense(10, activation='softmax', name='10cls')(x)
out_b = keras.layers.Dense(2, activation='softmax', name='2cls')(x)
out_c = keras.layers.Dense(1, activation='linear', name='1rg')(x)
encoder = keras.Model( inputs = input, outputs = [out_a, out_b, out_c], name="encoder")
# Let's plot
keras.utils.plot_model(
encoder
)
One thing to note, while defining out_a
, out_b
, and out_c
during model definition we set their name
variable which is very important. Their names are set '10cls'
, '2cls'
, and '1rg'
respectively. You can also see this from the above diagram (last 3 tails).
Compile and Run
Now, we can see why that name
variable is important. In order to run the model, we need to compile it first with the proper loss
function, metrics
, and optimizer
. Now, if you know that, for the classification
and regression
problem, the optimizer
can be the same but for the loss
function and metrics
should be changed. And in our model, which has a multi-type output model (2 classifications and 1 regression), we need to set proper loss
and metrics
for each of these types. Please, see below how it's done.
encoder.compile(
loss = {
"10cls": tf.keras.losses.CategoricalCrossentropy(),
"2cls": tf.keras.losses.CategoricalCrossentropy(),
"1rg": tf.keras.losses.MeanSquaredError()
},
metrics = {
"10cls": 'accuracy',
"2cls": 'accuracy',
"1rg": 'mse'
},
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
)
See, each last output of our above model, which is here represented by their name
variables. And we set proper compilation to them. Hope you understand this part. Now, time to train the model.
encoder.fit(xtrain, [y_out_a, y_out_b, y_out_c], epochs=30, verbose=2)
Epoch 1/30
1875/1875 - 6s - loss: 117.7318 - 10cls_loss: 3.2642 - 4cls_loss: 0.9040 - 1rg_loss: 113.5637 - 10cls_accuracy: 0.6057 - 4cls_accuracy: 0.8671 - 1rg_mse: 113.5637
Epoch 2/30
1875/1875 - 5s - loss: 62.1696 - 10cls_loss: 0.5151 - 4cls_loss: 0.2437 - 1rg_loss: 61.4109 - 10cls_accuracy: 0.8845 - 4cls_accuracy: 0.9480 - 1rg_mse: 61.4109
Epoch 3/30
1875/1875 - 5s - loss: 50.3159 - 10cls_loss: 0.2804 - 4cls_loss: 0.1371 - 1rg_loss: 49.8985 - 10cls_accuracy: 0.9295 - 4cls_accuracy: 0.9641 - 1rg_mse: 49.8985
Epoch 28/30
1875/1875 - 5s - loss: 15.5841 - 10cls_loss: 0.1066 - 4cls_loss: 0.0891 - 1rg_loss: 15.3884 - 10cls_accuracy: 0.9726 - 4cls_accuracy: 0.9715 - 1rg_mse: 15.3884
Epoch 29/30
1875/1875 - 5s - loss: 15.2199 - 10cls_loss: 0.1058 - 4cls_loss: 0.0859 - 1rg_loss: 15.0281 - 10cls_accuracy: 0.9736 - 4cls_accuracy: 0.9727 - 1rg_mse: 15.0281
Epoch 30/30
1875/1875 - 5s - loss: 15.2178 - 10cls_loss: 0.1136 - 4cls_loss: 0.0854 - 1rg_loss: 15.0188 - 10cls_accuracy: 0.9722 - 4cls_accuracy: 0.9736 - 1rg_mse: 15.0188
<tensorflow.python.keras.callbacks.History at 0x7ff42c18e110>
That's how each of the outputs of the last layer optimizes by their concern loss
function. FYI, one thing to mention, there is an essential parameter while .compile
the model which you might need: loss_weights
- to weight the loss contributions of different model outputs. See my other answer here on this.
Prediction / Inference
Let's see some output. We now hope this model will predict 3 things: (1) is what the digit, (2) is it even or odd, and (3) its square value.
import matplotlib.pyplot as plt
plt.imshow(xtrain[0])
If we like to quickly check the output layers of our model
encoder.output
[<KerasTensor: shape=(None, 10) dtype=float32 (created by layer '10cls')>,
<KerasTensor: shape=(None, 2) dtype=float32 (created by layer '4cls')>,
<KerasTensor: shape=(None, 1) dtype=float32 (created by layer '1rg')>]
Passing this xtrain[0]
(which we know 5) to the model to do predictions.
# we expand for a batch dimension: (1, 28, 28, 1)
pred10, pred2, pred1 = encoder.predict(tf.expand_dims(xtrain[0], 0))
# regression: square of the input dgit image
pred1
array([[22.098022]], dtype=float32)
# even or odd, surely odd
pred2.argmax()
0
# which number, surely 5
pred10.argmax()
5
Update
Based on your comment, we can extend the above model to take multi-input too. We need to change things. To demonstrate, we will use xtrain
and xtest
samples of the mnist
data set to the model as a multi-input.
(xtrain, ytrain), (xtest, _) = keras.datasets.mnist.load_data()
xtrain = xtrain[:10000] # both input sample should be same number
ytrain = ytrain[:10000] # both input sample should be same number
y_out_a = keras.utils.to_categorical(ytrain, num_classes=10)
y_out_b = keras.utils.to_categorical((ytrain % 2 == 0).astype(int), num_classes=2)
y_out_c = tf.square(tf.cast(ytrain, tf.float32))
print(xtrain.shape, xtest.shape)
print(y_out_a.shape, y_out_b.shape, y_out_c.shape)
# (10000, 28, 28) (10000, 28, 28)
# (10000, 10) (10000, 2) (10000,)
Next, we need to modify some parts of the above model to take multi-input. And next if you now plot, you will see the new graph.
input0 = keras.Input(shape=(28, 28, 1), name="img2")
input1 = keras.Input(shape=(28, 28, 1), name="img1")
concate_input = layers.Concatenate()([input0, input1])
x = layers.Conv2D(16, 3, activation="relu")(concate_input)
...
...
...
# multi-input , multi-output
encoder = keras.Model( inputs = [input0, input1],
outputs = [out_a, out_b, out_c], name="encoder")
Now, we can train the model as follows
# multi-input, multi-output
encoder.fit([xtrain, xtest], [y_out_a, y_out_b, y_out_c],
epochs=30, batch_size = 256, verbose=2)
Epoch 1/30
40/40 - 1s - loss: 66.9731 - 10cls_loss: 0.9619 - 2cls_loss: 0.4412 - 1rg_loss: 65.5699 - 10cls_accuracy: 0.7627 - 2cls_accuracy: 0.8815 - 1rg_mse: 65.5699
Epoch 2/30
40/40 - 0s - loss: 60.5408 - 10cls_loss: 0.8959 - 2cls_loss: 0.3850 - 1rg_loss: 59.2598 - 10cls_accuracy: 0.7794 - 2cls_accuracy: 0.8928 - 1rg_mse: 59.2598
Epoch 3/30
40/40 - 0s - loss: 57.3067 - 10cls_loss: 0.8586 - 2cls_loss: 0.3669 - 1rg_loss: 56.0813 - 10cls_accuracy: 0.7856 - 2cls_accuracy: 0.8951 - 1rg_mse: 56.0813
...
...
Epoch 28/30
40/40 - 0s - loss: 29.1198 - 10cls_loss: 0.4775 - 2cls_loss: 0.2573 - 1rg_loss: 28.3849 - 10cls_accuracy: 0.8616 - 2cls_accuracy: 0.9131 - 1rg_mse: 28.3849
Epoch 29/30
40/40 - 0s - loss: 27.5318 - 10cls_loss: 0.4696 - 2cls_loss: 0.2518 - 1rg_loss: 26.8104 - 10cls_accuracy: 0.8645 - 2cls_accuracy: 0.9142 - 1rg_mse: 26.8104
Epoch 30/30
40/40 - 0s - loss: 27.1581 - 10cls_loss: 0.4620 - 2cls_loss: 0.2446 - 1rg_loss: 26.4515 - 10cls_accuracy: 0.8664 - 2cls_accuracy: 0.9158 - 1rg_mse: 26.4515
Now, we can test the multi-input model and get multi-out from it.
pred10, pred2, pred1 = encoder.predict(
[
tf.expand_dims(xtrain[0], 0),
tf.expand_dims(xtrain[0], 0)
]
)
# regression part
pred1
array([[25.13295]], dtype=float32)
# even or odd
pred2.argmax()
0
# what digit
pred10.argmax()
5