Visualization of Speech Prosody and Emotion in Captions: Accessibility for Deaf and Hard-of-Hearing Users

Speech is expressive in ways that caption text does not capture, with emotion or emphasis information not conveyed. We interviewed eight Deaf and Hard-of-Hearing (DHH) individuals to understand if and how captions' inexpressiveness impacts them in online meetings with hearing peers. Automatically captioned speech, we found, lacks affective depth, lending it a hard-to-parse ambiguity and general dullness. Interviewees regularly feel excluded, which some understand is an inherent quality of these types of meetings rather than a consequence of current caption text design. Next, we developed three novel captioning models that depicted, beyond words, features from prosody, emotions, and a mix of both. In an empirical study, 16 DHH participants compared these models with conventional captions. The emotion-based model outperformed traditional captions in depicting emotions and emphasis, with only a moderate loss in legibility, suggesting its potential as a more inclusive design for captions.

Rochester Institute of Technology, Rochester, New York, United States

New Jersey Institute of Technology, Newark, New Jersey, United States

Rochester Institute of Technology, Rochester, New York, United States

https://doi.org/10.1145/3544548.3581511

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2023.acm.org/)

Room Y03+Y04

6 件の発表

開始日時2023-04-25 20:10:00

終了日時2023-04-25 21:35:00

お気に入り

あとで読む

コレクション