This paper explores a multimodal approach for translating emotional cues present in speech, designed with Deaf and Hard-of-Hearing (DHH) individuals in mind. Prior work has focused on visual cues applied to captions, successfully conveying whether a speaker's words have a negative or positive tone (valence), but with mixed results regarding the intensity (arousal) of these emotions. We propose a novel method using haptic feedback to communicate a speaker's arousal levels through vibrations on a wrist-worn device. In a formative study with 16 DHH participants, we tested six haptic patterns and found that participants preferred single per-word vibrations at 75 Hz to encode arousal. In a follow-up study with 27 DHH participants, this pattern was paired with visual cues, and narrative engagement with audio-visual content was measured. Results indicate that combining haptics with visuals significantly increased engagement compared to a conventional captioning baseline and a visuals-only affective captioning style.
https://dl.acm.org/doi/10.1145/3706598.3713304
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)