TransPose is a human pose estimation model based on a CNN feature extractor, a Transformer Encoder, and a prediction head. Given an image, the attention layers built in Transformer can efficiently capture long-range spatial relationships between keypoints and explain what dependencies the predicted keypoints locations highly rely on.
We choose two types of CNNs as the backbone candidates: ResNet and HRNet. The derived convolutional blocks are ResNet-Small, HRNet-Small-W32, and HRNet-Small-W48.
Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset
Model
Input size
FPS*
GFLOPs
AP
Ap .5
AP .75
AP (M)
AP (L)
AR
AR .5
AR .75
AR (M)
AR (L)
TransPose-R-A3
256x192
141
8.0
0.717
0.889
0.788
0.680
0.786
0.771
0.930
0.836
0.727
0.835
TransPose-R-A4
256x192
138
8.9
0.726
0.891
0.799
0.688
0.798
0.780
0.931
0.845
0.735
0.844
TransPose-H-S
256x192
45
10.2
0.742
0.896
0.808
0.706
0.810
0.795
0.935
0.855
0.752
0.856
TransPose-H-A4
256x192
41
17.5
0.753
0.900
0.818
0.717
0.821
0.803
0.939
0.861
0.761
0.865
TransPose-H-A6
256x192
38
21.8
0.758
0.901
0.821
0.719
0.828
0.808
0.939
0.864
0.764
0.872
Note:
we computed the average FPS* of testing 100 samples from coco val dataset (with batchsize=1) on a single NVIDIA 2080Ti GPU. The FPS may fluctuate up and down at different tests.
We trained our different models on different hardware platforms: 1 x RTX2080Ti GPUs (TP-R-A4), 4 x TiTan XP GPUs (TP-H-S, TP-H-A4), and 4 x Tesla P40 GPUs (TP-H-A6).
Results on COCO test-dev2017 with detector having human AP of 60.9 on COCO test-dev2017 dataset
Given an input image, a pretrained TransPose model, and the predicted locations, we can visualize the spatial dependencies of the predicted locations with threshold for the attention scores.
TransPose-R-A4 with threshold=0.00
TransPose-R-A4 with threshold=0.01
TransPose-H-A4 with threshold=0.00
TransPose-H-A4 with threshold=0.00075
Getting started
Installation
Clone this repository, and we'll call the directory that you cloned as ${POSE_ROOT}
We follow the steps of HRNet to prepare the COCO train/val/test dataset and the annotations. The detected person results are downloaded from OneDrive or GoogleDrive. Please download or link them to ${POSE_ROOT}/data/coco/, and make them look like this:
请发表评论