Seeing, Hearing, and Knowing Together: Multimodal Strategies in Deepfake Videos Detection

As deepfake videos become increasingly difficult for people to recognise, understanding the strategies humans use is key to designing effective media literacy interventions. We conducted a study with 195 participants between the ages of 21 and 40, who judged real and deepfake videos, rated their confidence, and reported the cues they relied on across visual, audio, and knowledge strategies. Participants were more accurate with real videos than with deepfakes and showed lower expected calibration error for real content. Through association rule mining, we identified cue combinations that shaped performance. Visual appearance, vocal, and intuition often co-occurred for successful identifications, which highlights the importance of multimodal approaches in human detection. Our findings show which cues help or hinder detection and suggest directions for designing media literacy tools that guide effective cue use. Building on these insights can help people improve their identification skills and become more resilient to deceptive digital media.

Nanyang Technological University, Singapore, Singapore

Nanyang Technological University, Singapore, Singapore, Singapore

ACM CHI Conference on Human Factors in Computing Systems

P1 - Room 119

7 件の発表

開始日時2026-04-15 20:15:00

終了日時2026-04-15 21:45:00

お気に入り

あとで読む

コレクション

Seeing, Hearing, and Knowing Together: Multimodal Strategies in Deepfake Videos Detection

要旨

著者

会議: CHI 2026

セッション: Liars & Deepfakes