tensorflow2.0 - Debugging TensorFlow serving on BERT model

Question

Welcome To Ask or Share your Answers For Others

tensorflow2.0 - Debugging TensorFlow serving on BERT model

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

tensorflow2.0 - Debugging TensorFlow serving on BERT model

I was able to deploy a NLP model using BERT embedding following this example (using TF 1.14.0 on CPU and tensorflow-model-server): https://mc.ai/how-to-ship-machine-learning-models-into-production-with-tensorflow-serving-and-kubernetes/

The model description is pretty clean:

!saved_model_cli show --dir {'tf_bert_model/1'} --all

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['Input-Segment:0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 64)
        name: Input-Segment:0
    inputs['Input-Token:0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 64)
        name: Input-Token:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['dense/Softmax:0'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 2)
        name: dense/Softmax:0
  Method name is: tensorflow/serving/predict

And the data input formatting for the served model is a list of dictionaries:

data
'{"instances": [{"Input-Token:0": [101, 101, 1962, 7770, 1069, 102, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "Input-Segment:0": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]}]}'

r = requests.post("http://127.0.0.1:8501/v1/models/tf_bert_model:predict",
 json=data)

I'm now trying to deploy a BERT model using TF2.1, HuggingFace transformer library and on GPU but the deployed model is returning either a 400 error or a 200 error and I don't know how to debug it. I suspect that it may be a data input formatting issue.

My model description is messier:

2020-03-20 14:47:03.465762: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2020-03-20 14:47:03.465883: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2020-03-20 14:47:03.465900: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is: 

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['attention_mask'] tensor_info:
        dtype: DT_INT32
        shape: (-1, 128)
        name: serving_default_attention_mask:0
    inputs['input_ids'] tensor_info:
        dtype: DT_INT32
        shape: (-1, 128)
        name: serving_default_input_ids:0
    inputs['labels'] tensor_info:
        dtype: DT_INT32
        shape: (-1, 1)
        name: serving_default_labels:0
    inputs['token_type_ids'] tensor_info:
        dtype: DT_INT32
        shape: (-1, 128)
        name: serving_default_token_type_ids:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['output_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 2)
        name: StatefulPartitionedCall:0
  Method name is: tensorflow/serving/predict
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

Defined Functions:
  Function Name: '__call__'
    Option #1
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='inputs/labels')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #2
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #3
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #4
      Callable with:
        Argument #1
          DType: dict
          Value: {'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='inputs/labels'), 'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/attention_mask')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']

  Function Name: '_default_save_signature'
    Option #1
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels')}

  Function Name: 'call_and_return_all_conditional_losses'
    Option #1
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #2
      Callable with:
        Argument #1
          DType: dict
          Value: {'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='attention_mask'), 'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='labels'), 'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='input_ids')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #3
      Callable with:
        Argument #1
          DType: dict
          Value: {'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='inputs/labels'), 'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/attention_mask')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']
    Option #4
      Callable with:
        Argument #1
          DType: dict
          Value: {'labels': TensorSpec(shape=(None, 1), dtype=tf.int32, name='inputs/labels'), 'input_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/input_ids'), 'token_type_ids': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/token_type_ids'), 'attention_mask': TensorSpec(shape=(None, 128), dtype=tf.int32, name='inputs/attention_mask')}
        Named Argument #1
          DType: str
          Value: ['t', 'r', 'a', 'i', 'n', 'i', 'n', 'g']

I formatted my data input as a list of dictionaries as well:

data = {"instances": test_deploy_inputs2}
data
{'instances': [{'attention_mask': [1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    1,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0],
   'input_ids': [101,
    1999,
    5688,
    1010,
    12328,
    5845,
    2007,
    5423,
    3593,
    28991,
    19362,
    4588,
    4244,
    4820,
    12553,
    12987,
    10737,
    2008,
    23150,
    14719,
    1011,
    20802,
    3662,
    2896,
    3798,
    1997,
    17953,
    14536,
    2509,
    1998,
    6335,
    1011,
    1015,
    29720,
    1998,
    2020,
    11914,
    5123,
    2013,
    6388,
    2135,
    10572,
    27441,
    7315,
    1012,
    102,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0,
    0

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:16:11+0000

My bad! A Response [200] doesn't mean that it's not working, you can see the results with

predictions = json.loads(json_response.text)['predictions']
predictions

Categories

tensorflow2.0 - Debugging TensorFlow serving on BERT model

tensorflow2.0 - Debugging TensorFlow serving on BERT model

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags