Virtual humans can be used to deliver persuasive arguments; yet, those with synthetic text-to-speech (TTS) have been perceived less favorably than those with recorded human speech. In this paper, we investigate standard concatenative TTS and more advanced neural TTS. We conducted a 3x2 between-subjects experiment (n=79) to evaluate the effect of a virtual human’s speech fidelity at three levels (Standard TTS, Neural TTS, and Human speech) and the listener’s gender (male or female) on perceptions and persuasion. We found that the virtual human was perceived as significantly less trustworthy by both genders, if they used neural TTS compared to human speech, while male listeners (but not females) also perceived standard TTS as less trustworthy than human speech. Our findings indicate that neural TTS may not be an effective choice for persuasive virtual humans and that gender of the listener plays a role in how virtual humans are perceived.
https://dl.acm.org/doi/abs/10.1145/3491102.3517564
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2022.acm.org/)