I followed the new Tensorflow 2 Object Detection API 2 documentation to train a Faster RCNN detector using transfer learning on Google Cloud Platform TPU. After the training is completed, I dowloaded the result on my workstation and exported the model using the tensorflow 2 implementation ('object_detection/exporter_main_v2.py'). I followed the official instructions and setup the environment locally (running on macOS catalina, tensorflow 2.2, python 3.6 etc)
However the Non-Max-Supprersion (NMS) part of the inference pipeline seems not to be working as there are cases where bounding boxes of different classes overlap almost completed. I debugged the code to ensure that the object detection api implementation of NMS (batch_multiclass_non_max_suppression method in object_detection/core/post_processing.py) is called in the inference pipeline for the Faster-RCNN model. It is called twice as expected by the Fast-RCNN architecture on inference.
The instructions I used for GCPs AI-Platform TPU, are the ones in the official object detection api page: link. I made corrections in the training parameters to use the TPU runtime and Python version that are supported on GCP as the actual example are not supported. Instead I used:
gcloud ai-platform jobs submit training whoami
object_detectiondate +%m_%d_%Y_%H_%M_%S
--job-dir=gs://${MODEL_DIR}
--package-path ./object_detection
--module-name object_detection.model_main_tf2
--runtime-version 2.2
--python-version 3.7
--scale-tier BASIC_TPU
--region us-central1
--
--use_tpu true
--model_dir=gs://${MODEL_DIR}
--pipeline_config_path=gs://${PIPELINE_CONFIG_PATH}
The dataset I used for training was the Pets example from the official object detection api page: link. However I exported it using the Tensorflow 2 Object Detection API 2 methods for consistency.
The pre-trained neural network I uses was the Faster R-CNN ResNet101 V1 1024x1024 trained on TPU.
The configuration file I used was faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8.config for TPU training.
I changed the number of classes to 37.
I also changed the number of batches to batch_size: 32 as gpc on tpu v2 was crashing.
The fine_tune_checkpoint_type was changed to fine_tune_checkpoint_type: "detection" and the only data augmentation I used was random_horizontal_flip.
The official object detection 2 model zoo reports results on TPU trained architectures other than SSD.
However the official object detection tpu compatibility guide mentions that currently SSD is only supported while non max suppression is not.
Why NMS is not working?
question from:
https://stackoverflow.com/questions/65850309/tensor-flow-2-object-detection-api2-batch-non-max-suppression-in-trained-faster 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…