ProTAL: A Drag-and-Link Video Programming Framework for Temporal Action Localization

要旨

Temporal Action Localization (TAL) aims to detect the start and end timestamps of actions in a video. However, the training of TAL models requires a substantial amount of manually annotated data. Data programming is an efficient method to create training labels with a series of human-defined labeling functions. However, its application in TAL faces difficulties of defining complex actions in the context of temporal video frames. In this paper, we propose ProTAL, a drag-and-link video programming framework for TAL. ProTAL enables users to define \textbf{key events} by dragging nodes representing body parts and objects and linking them to constrain the relations (direction, distance, etc.). These definitions are used to generate action labels for large-scale unlabelled videos. A semi-supervised method is then employed to train TAL models with such labels. We demonstrate the effectiveness of ProTAL through a usage scenario and a user study, providing insights into designing video programming framework.

著者
Yuchen He
Zhejiang University, Hangzhou, Zhejiang, China
Jianbing Lv
Zhejiang University, Ningbo, Zhejiang, China
Liqi Cheng
Zhejiang University, Hangzhou, China
Lingyu Meng
Zhejiang University, Hangzhou, Zhejiang, China
Dazhen Deng
Zhejiang University, Ningbo, Zhejiang, China
Yingcai Wu
Zhejiang University, Hangzhou, Zhejiang, China
DOI

10.1145/3706598.3713741

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713741

動画

会議: CHI 2025

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)

セッション: Video Making

G303
7 件の発表
2025-04-29 23:10:00
2025-04-30 00:40:00
日本語まとめ
読み込み中…