Uncovering Human Traits in Determining Real and Spoofed Audio: Insights from Blind and Sighted Individuals

要旨

This paper explores how blind and sighted individuals perceive real and spoofed audio, highlighting differences and similarities between the groups. Through two studies, we find that both groups focus on specific human traits in audio--such as accents, vocal inflections, breathing patterns, and emotions--to assess audio authenticity. We further reveal that humans, irrespective of visual ability, can still outperform current state-of-the-art machine learning models in discerning audio authenticity; however, the task proves psychologically demanding. Moreover, detection accuracy scores between blind and sighted individuals are comparable, but each group exhibits unique strengths: the sighted group excels at detecting deepfake-generated audio, while the blind group excels at detecting text-to-speech (TTS) generated audio. These findings not only deepen our understanding of machine-manipulated and neural-renderer audio but also have implications for developing countermeasures, such as perceptible watermarks and human-AI collaboration strategies for spoofing detection.

著者
Chaeeun Han
Pennsylvania State University, University Park, Pennsylvania, United States
Prasenjit Mitra
The Pennsylvania State University, University Park, Pennsylvania, United States
Syed Masum Billah
Pennsylvania State University, University Park , Pennsylvania, United States
論文URL

https://doi.org/10.1145/3613904.3642817

動画

会議: CHI 2024

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2024.acm.org/)

セッション: Trust in Social Media

313C
5 件の発表
2024-05-14 01:00:00
2024-05-14 02:20:00