• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

frostinassiky/gtad: The official implementation of G-TAD: Sub-Graph Localization ...

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称(OpenSource Name):

frostinassiky/gtad

开源软件地址(OpenSource Url):

https://github.com/frostinassiky/gtad

开源编程语言(OpenSource Language):

Python 86.2%

开源软件介绍(OpenSource Introduction):

G-TAD

PWC

This repo holds the codes of paper: "G-TAD: Sub-Graph Localization for Temporal Action Detection", accepted in CVPR 2020.

G-TAD Overview

Update

15 Dec 2020: The configuration for HACS Segment dataset is in the hacs branch. With the officail I3D pretrained features, G-TAD can reach 27.481 Average mAP without tuning the model architecture.

24 Nov 2020: to celebrate my 2nd anniversary with Sally, I released the code for ActivityNet. :P Please checkout the branch anet to see the details. Feature: GooogleDrive, md5sum: 0ce54748883c4ce1cf6600f5ad04421b.

30 Mar 2020: THUMOS14 feature is available! GooogleDrive, OneDrive

15 Apr 2020: THUMOS14 code is published! I update the post processing code so the experimental result is slightly better than the orignal paper!

29 Apr 2020: We updated our code based on @Phoenix1327's comment. The experimental result is slightly better. Please see details in this issue.

Overview

Temporal action detection is a fundamental yet challenging task in video understanding. Video context is a critical cue to effectively detect actions, but current works mainly focus on temporal context, while neglecting semantic context as well as other important context properties. In this work, we propose a graph convolutional network (GCN) model to adaptively incorporate multi-level semantic context into video features and cast temporal action detection as a sub-graph localization problem. Specifically, we formulate video snippets as graph nodes, snippet-snippet correlations as edges, and actions associated with context as target sub-graphs. With graph convolution as the basic operation, we design a GCN block called GCNeXt, which learns the features of each node by aggregating its context and dynamically updates the edges in the graph. To localize each sub-graph, we also design a SGAlign layer to embed each sub-graph into the Euclidean space. Extensive experiments show that G-TAD is capable of finding effective video context without extra supervision and achieves state-of-the-art performance on two detection benchmarks. On ActityNet-1.3, we obtain an average mAP of 34.09%; on THUMOS14, we obtain 40.16% in [email protected], beating all the other one-stage methods.

Detail, Video, Arxiv.

Dependencies

  • Python == 3.7
  • Pytorch==1.1.0 or 1.3.0
  • CUDA==10.0.130
  • CUDNN==7.5.1_0
  • GCC >= 4.9

Installation

Based on the idea of ROI Alignment from Mask-RCNN, we devoloped SGAlign layer in our implementation. You have to compile a short cuda code to run Algorithm 1 in our paper.

  1. Create conda environment
    conda env create -f env.yml
    source activate gtad
  2. Install Align1D2.2.0
    cd gtad_lib
    python setup.py install
  3. Test Align1D2.2.0
    python align.py

Data setup

To reproduce the results in THUMOS14 without further changes:

  1. Download the data from GooogleDrive or OneDrive.

  2. Place it into a folder named TSN_pretrain_avepool_allfrms_hdf5 inside data/thumos_feature.

You could also pass the folder containing the HDF5 files if the script admits the following argument --feature_path.

Code Architecture

gtad                        # this repo
├── data                    # feature and label
├── evaluation              # evaluation code from offical API
├── gtad_lib                # gtad library
└── ...

Train and evaluation

After downloading the dataset and setting up the envirionment, you can start from the following script.

python gtad_train.py
python gtad_inference.py
python gtad_postprocessing.py

or

bash gtad_thumos.sh | tee log.txt

If everything goes well, you can get the following result:

mAP at tIoU 0.3 is 0.5731204387052588
mAP at tIoU 0.4 is 0.5129888769308306
mAP at tIoU 0.5 is 0.43043083034478025
mAP at tIoU 0.6 is 0.32653130678508374
mAP at tIoU 0.7 is 0.22806267480976325

Bibtex

CVPR Version.

@InProceedings{xu2020gtad,
author = {Xu, Mengmeng and Zhao, Chen and Rojas, David S. and Thabet, Ali and Ghanem, Bernard},
title = {G-TAD: Sub-Graph Localization for Temporal Action Detection},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Reference

Those are very helpful and promising implementations for the temporal action localization task. My implementations borrow ideas from them.

  • BSN: Boundary Sensitive Network for Temporal Action Proposal Generation. Paper Code

  • BMN: BMN: Boundary-Matching Network for Temporal Action Proposal Generation. Paper Code - PaddlePaddle Code PyTorch

  • Graph Convolutional Networks for Temporal Action Localization. Paper Code

Contact

mengmeng.xu[at]kaust.edu.sa




鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
yktoo/dklang: DKLang Localization Package发布时间:2022-08-15
下一篇:
futurice/whereareyou: Passive indoor localization using WiFi signal strength发布时间:2022-08-15
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap