A curated publication list on weakly-supervised temporal action localization.
This repository was built to facilitate navigating the mainstream on weakly-supervised temporal action localization.
Please note that only accepted papers (for reliability) by conferences (for brevity) are contained here.
The mean average precisions (mAPs) under the standard intersection over union (IoU) thresholds are reported.
For example, '@0.5' indicates the mAP score at the IoU threshold of 0.5.
The AVG denotes the average mAP under the IoU thresholds from 0.1 to 0.7 (for THUMOS14), from 0.1 to 0.5 (for FineGym), or from 0.5 to 0.95 with a step size of 0.05 (for ActivityNet both versions and FineAction).
In addition, links to the implementations are attached with their framework specification if available. 'o-' and 'u-' indicate the official and the unofficial implementations, respectively.
[Note]
*: use of additional trimmed videos
†: use of additional information such as action count, pose, and audio
[3C-Net†] | ICCV'19 | 3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization | [pdf] | [o-pytorch]
[Nguyen et al.] | ICCV'19 | Weakly-supervised Action Localization with Background Modeling | [pdf]
[PreTrimNet†] | AAAI'20 | Multi-Instance Multi-Label Action Recognition and Localization Based on Spatio-Temporal Pre-Trimming for Untrimmed Videos | [pdf]
请发表评论