在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称(OpenSource Name):wei-tim/YOWO开源软件地址(OpenSource Url):https://github.com/wei-tim/YOWO开源编程语言(OpenSource Language):Python 100.0%开源软件介绍(OpenSource Introduction):You Only Watch Once (YOWO)PyTorch implementation of the article "You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization". The repositry contains code for real-time spatiotemporal action localization with PyTorch on AVA, UCF101-24 and JHMDB datasets! Updated paper can be accessed via YOWO_updated.pdf AVA dataset visualizations!
UCF101-24 and J-HMDB-21 datasets visualizations!
In this work, we present YOWO (You Only Watch Once), a unified CNN architecture for real-time spatiotemporal action localization in video stream. YOWO is a single-stage framework, the input is a clip consisting of several successive frames in a video, while the output predicts bounding box positions as well as corresponding class labels in current frame. Afterwards, with specific strategy, these detections can be linked together to generate Action Tubes in the whole video. Since we do not separate human detection and action classification procedures, the whole network can be optimized by a joint loss in an end-to-end framework. We have carried out a series of comparative evaluations on two challenging representative datasets UCF101-24 and J-HMDB-21. Our approach outperforms the other state-of-the-art results while retaining real-time capability, providing 34 frames-per-second on 16-frames input clips and 62 frames-per-second on 8-frames input clips. Installationgit clone https://github.com/wei-tim/YOWO.git
cd YOWO DatasetsUse instructions here for the preperation of AVA dataset. Modify the paths in ucf24.data and jhmdb21.data under cfg directory accordingly. Download the dataset annotations from here. Download backbone pretrained weights
wget http://pjreddie.com/media/files/yolo.weights
NOTE: For JHMDB-21 trainings, HMDB-51 finetuned pretrained models should be used! (e.g. "resnext-101-kinetics-hmdb51_split1.pth").
Pretrained YOWO modelsPretrained models for UCF101-24 and J-HMDB-21 datasets can be downloaded from here. Pretrained models for AVA dataset can be downloaded from here. All materials (annotations and pretrained models) are also available in Baiduyun Disk: here with password 95mm Running the code
python main.py --cfg cfg/ava.yaml
python main.py --cfg cfg/ucf24.yaml
python main.py --cfg cfg/jhmdb.yaml Validating the model
python evaluation/Object-Detection-Metrics/pascalvoc.py --gtfolder PATH-TO-GROUNDTRUTHS-FOLDER --detfolder PATH-TO-DETECTIONS-FOLDER
python video_mAP.py --cfg cfg/ucf24.yaml Running on a text video
python test_video_ava.py --cfg cfg/ava.yaml UPDATEs:
CitationIf you use this code or pre-trained models, please cite the following: @InProceedings{kopuklu2019yowo,
title={You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization},
author={K{\"o}p{\"u}kl{\"u}, Okan and Wei, Xiangyu and Rigoll, Gerhard},
journal={arXiv preprint arXiv:1911.06644},
year={2019}
} AcknowledgementsWe thank Hang Xiao for releasing pytorch_yolo2 codebase, which we build our work on top. |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论