Multiple Temporal Pooling Mechanisms for Weakly Supervised Temporal Action Localization

Published: 25 February 2023 Publication History


Recent action localization works learn in a weakly supervised manner to avoid the expensive cost of human labeling. Those works are mostly based on the Multiple Instance Learning framework, where temporal pooling is an indispensable part that usually relies on the guidance of snippet-level Class Activation Sequences (CAS). However, we observe that previous works only leverage a simple convolutional neural network for the generation of CAS, which ignores the weak discriminative foreground action segments and the background ones, and meanwhile, the relationship between different actions has not been considered. To solve this problem, we propose multiple temporal pooling mechanisms (MTP) for a more sufficient information utilization. Specifically, with the design of the Foreground Variance Branch, Dual Foreground Attention Branch and Hybrid Attention Fine-tuning Branch, MTP can leverage more effective information from different aspects and generate different CASs to guide the learning of temporal pooling. Moreover, different loss functions are designed for a better optimization of individual branches, aiming to effectively distinguish the action from the background. Our method shows excellent results on the THUMOS14 and ActivityNet1.2 datasets.


  • (2024)Discriminative Action Snippet Propagation Network for Weakly Supervised Temporal Action LocalizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364381520:6(1-21)Online publication date: 8-Mar-2024
  • (2024)Snippet-to-Prototype Contrastive Consensus Network for Weakly Supervised Temporal Action LocalizationIEEE Transactions on Multimedia10.1109/TMM.2024.335562826(6717-6729)Online publication date: 2024

    Information & Contributors


    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 3
    May 2023
    514 pages
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents


    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 February 2023
    Online AM: 13 October 2022
    Accepted: 03 October 2022
    Revised: 16 August 2022
    Received: 27 April 2022
    Published in TOMM Volume 19, Issue 3


    Author Tags

    1. Weakly supervised temporal action localization
    2. multiple instance learning
    3. temporal pooling


    Funding Sources

    • National Natural Science Foundation of China


    • (2024)Discriminative Action Snippet Propagation Network for Weakly Supervised Temporal Action LocalizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364381520:6(1-21)Online publication date: 8-Mar-2024
    • (2024)Snippet-to-Prototype Contrastive Consensus Network for Weakly Supervised Temporal Action LocalizationIEEE Transactions on Multimedia10.1109/TMM.2024.335562826(6717-6729)Online publication date: 2024

