Captions rarely convey emotional nuances in speech, leaving Deaf and Hard-of-Hearing (DHH) viewers without access to tonal and affective information. We present a two-part mixed-methods study on how haptic feedback can communicate vocal emotion without adding visual load. In Part 1, we replicated an arousal-driven captioning approach using speech-emotion-recognition to modulate typographic weight and vibration intensity. Participants showed divergent mental models and often mapped “more vibration” to loudness rather than emotional arousal, underscoring the construct’s conceptual fuzziness. In Part 2, we evaluated five acoustic-to-haptic mappings that bypass affective inference and translate pitch, rhythm, and waveform cues into vibration patterns. No single pattern dominated, but participants associated options such as ‘pulse’ or ‘sawtooth’ with high-arousal emotions, and ‘pitch-normalized’ signals with calmer states. We derive design guidelines emphasizing contrastive, acoustically grounded mappings and user control for integrating emotional haptics into short-form, captioned media.
ACM CHI Conference on Human Factors in Computing Systems