Seeing, Hearing, and Knowing Together: Multimodal Strategies in Deepfake Videos Detection

要旨

As deepfake videos become increasingly difficult for people to recognise, understanding the strategies humans use is key to designing effective media literacy interventions. We conducted a study with 195 participants between the ages of 21 and 40, who judged real and deepfake videos, rated their confidence, and reported the cues they relied on across visual, audio, and knowledge strategies. Participants were more accurate with real videos than with deepfakes and showed lower expected calibration error for real content. Through association rule mining, we identified cue combinations that shaped performance. Visual appearance, vocal, and intuition often co-occurred for successful identifications, which highlights the importance of multimodal approaches in human detection. Our findings show which cues help or hinder detection and suggest directions for designing media literacy tools that guide effective cue use. Building on these insights can help people improve their identification skills and become more resilient to deceptive digital media.

著者
Chen Chen
Nanyang Technological University, Singapore, Singapore
Dion Goh
Nanyang Technological University, Singapore, Singapore, Singapore

会議: CHI 2026

ACM CHI Conference on Human Factors in Computing Systems

セッション: Liars & Deepfakes

P1 - Room 119
7 件の発表
2026-04-15 20:15:00
2026-04-15 21:45:00