I was able to deploy a NLP model using BERT embedding following this example (using TF 1.14.0 on CPU and tensorflow-model-server):
https://mc.ai/how-to-ship-machine-learning-models-into-production-with-tensorflow-serving-and-kubernetes/
The model description is pretty clean:
!saved_model_cli show --dir {'tf_bert_model/1'} --all
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['Input-Segment:0'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 64)
name: Input-Segment:0
inputs['Input-Token:0'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 64)
name: Input-Token:0
The given SavedModel SignatureDef contains the following output(s):
outputs['dense/Softmax:0'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: dense/Softmax:0
Method name is: tensorflow/serving/predict
And the data input formatting for the served model is a list of dictionaries:
data
'{"instances": [{"Input-Token:0": [101, 101, 1962, 7770, 1069, 102, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "Input-Segment:0": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]}]}'
r = requests.post("http://127.0.0.1:8501/v1/models/tf_bert_model:predict",
json=data)
I'm now trying to deploy a BERT model using TF2.1, HuggingFace transformer library and on GPU but the deployed model is returning either a 400 error or a 200 error and I don't know how to debug it. I suspect that it may be a data input formatting issue.
My model description is messier:
2020-03-20 14:47:03.465762: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2020-03-20 14:47:03.465883: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2020-03-20 14:47:03.465900: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['__saved_model_init_op']:
The given SavedModel SignatureDef contains the following input(s):
The given SavedModel SignatureDef contains the following output(s):
outputs['__saved_model_init_op'] tensor_info:
dtype: DT_INVALID
shape: unknown_rank
name: NoOp
Method name is:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['attention_mask'] tensor_info:
dtype: DT_INT32
shape: (-1, 128)
name: serving_default_attention_mask:0
inputs['input_ids'] tensor_info:
dtype: DT_INT32
shape: (-1, 128)
name: serving_default_input_ids:0
inputs['labels'] tensor_info:
dtype: DT_INT32
shape: (-1, 1)
name: serving_default_labels:0
inputs['token_type_ids'] tensor_info:
dtype: DT_INT32
shape: (-1, 128)
name: serving_default_token_type_ids:0
The given SavedModel SignatureDef contains the following output(s):
outputs['output_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Defined Functions:
Function Name: '__call__'
Option #1
Callable with:
Argument #1
DType: dict
Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='inputs/labels')}
Named Argument #1
DType: str
Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
Option #2
Callable with:
Argument #1
DType: dict
Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels')}
Named Argument #1
DType: str
Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
Option #3
Callable with:
Argument #1
DType: dict
Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels')}
Named Argument #1
DType: str
Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
Option #4
Callable with:
Argument #1
DType: dict
Value: {'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='inputs/labels'), 'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/attention_mask')}
Named Argument #1
DType: str
Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
Function Name: '_default_save_signature'
Option #1
Callable with:
Argument #1
DType: dict
Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels')}
Function Name: 'call_and_return_all_conditional_losses'
Option #1
Callable with:
Argument #1
DType: dict
Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels')}
Named Argument #1
DType: str
Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
Option #2
Callable with:
Argument #1
DType: dict
Value: {'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels'), 'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids')}
Named Argument #1
DType: str
Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
Option #3
Callable with:
Argument #1
DType: dict
Value: {'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='inputs/labels'), 'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/attention_mask')}
Named Argument #1
DType: str
Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
Option #4
Callable with:
Argument #1
DType: dict
Value: {'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='inputs/labels'), 'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/attention_mask')}
Named Argument #1
DType: str
Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
I formatted my data input as a list of dictionaries as well:
data = {"instances": test_deploy_inputs2}
data
{'instances': [{'attention_mask': [1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0],
'input_ids': [101,
1999,
5688,
1010,
12328,
5845,
2007,
5423,
3593,
28991,
19362,
4588,
4244,
4820,
12553,
12987,
10737,
2008,
23150,
14719,
1011,
20802,
3662,
2896,
3798,
1997,
17953,
14536,
2509,
1998,
6335,
1011,
1015,
29720,
1998,
2020,
11914,
5123,
2013,
6388,
2135,
10572,
27441,
7315,
1012,
102,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0