Voice and Speech

会議の名前
CSCW2021
KinVoices: Using Voices of Friends and Family in Voice Interfaces
要旨

With voice user interfaces (VUIs) becoming ubiquitous and speech synthesis technology maturing, it is possible to synthesise voices to resemble our friends and relatives (which we will collectively call `kin') and use them on VUIs. However, designing such interfaces and investigating how the familiarity of kin voices affect user perceptions remain under-explored. Our surveys and interviews with 25 users revealed that VUIs using kin voices were perceived as more engaging, persuasive and safer yet eerier than VUIs using common virtual assistant voices. We then developed a technology probe, KinVoice, an Alexa-based VUI which was deployed in 3 households over 2 weeks. Users set reminders using KinVoice, which in turn, gave the reminders in synthesised kin voices. This was to explore users’ needs, uncover challenges involved and inspire new applications. We discuss design guidelines for integrating familiar kin voices into VUIs, applications that benefit from its usage, and implications for balancing voice realism and usability with security and diversification.

著者
Sam W. T.. Chan
Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand
Tamil Selvan Gunasekaran
The University of Auckland, Auckland, -- Country other than USA or Canada --, New Zealand
Yun Suen Pai
Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
Haimo Zhang
Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand
Suranga Nanayakkara
Auckland Bioengineering Institute, The University of Auckland, Auckland, New Zealand
論文URL

https://doi.org/10.1145/3479590

動画
Social Media through Voice: Synthesized Voice Qualities and Self-presentation
要旨

With advances in expressive speech synthesis and conversational understanding, an ever-increasing amount of digital content---including social and personal content---can be consumed through voice. Voice has long been known to convey personal characteristics and emotional states, both of which are prominent aspects of social media. Yet, no study has investigated voice design requirements for social media platforms. We interviewed 15 active social media users about their preferences on using synthesized voices to represent their profiles. Our findings show that participants want to have control over how a voice delivers their content, such as the personality and emotion with which the voice speaks, because these prosodic variations can impact users' online persona and interfere with impression management. We report motivations behind customizing or not customizing voice characteristics in different scenarios, and uncover key challenges around usability and the potential for stereotyping. We argue that synthesized speech for social media should be evaluated not only on listening experience and voice quality but also on its expressivity, degree of customizability, and ability to adapt to contexts (e.g., social media platforms, groups, individual posts). We discuss how our contribution confirms and extends knowledge of voice technology design and online self-presentation, and offer design considerations for voice personalization related to social interactions.

著者
Lotus Zhang
University of Washington, Seattle, Washington, United States
Lucy Jiang
University of Washington, Seattle, Washington, United States
Nicole Washington
Human Centered Design and Engineering, University of Washington, Seattle, Washington, United States
Augustina Ao. Liu
University of Washington, Seattle, Washington, United States
Jingyao Shao
Human Centered Design and Engineering, University of Washington, Seattle, Washington, United States
Adam Fourney
Microsoft Research, Redmond, Washington, United States
Meredith Ringel. Morris
Microsoft Research, Redmond, Washington, United States
Leah Findlater
論文URL

https://doi.org/10.1145/3449235

動画
Speaking from Experience: Trans/Non-Binary Requirements for Voice-Activated AI
要旨

Voice-Activated Artificial Intelligence (VAI) is increasingly ubiquitous, whether appearing as context-specific conversational assistants or more personalised and generalised personal assistants such as Alexa or Siri. CSCW and other researchers have regularly studied the (positive and negative) social consequences of their design and deployment. One particular focus has been questions of gender, and the implications that the (often-feminine) gendering of VAIs has for societal norms and user experiences. Studies into this have largely elided transgender (trans) existences; the few exceptions to this operate largely from an external and predetermined idea of trans and/or non-binary user needs, centered on representation. In this study, we undertook a series of qualitative interviews with trans and/or non-binary users of VAIs to explore their experiences and needs. Our results show that these needs are far more than a question of representation, and instead have implications for the framing of gender as a concept by VAI designers, their approach to user privacy, the wider feature set supported, and the structures and contexts in which VAIs are designed. We provide both immediate recommendations for designers and researchers seeking to create trans-inclusive VAIs, and wider, critical proposals for how we as researchers go about assessing technological systems and appropriate points of intervention.

著者
Cami Rincon
Goldsmiths, University of London, London, United Kingdom
Os Keyes
University of Washington, Seattle, Washington, United States
Corinne Cath
University of Oxford, Oxford, United Kingdom
論文URL

https://doi.org/10.1145/3449206

動画
Owning and Sharing: Privacy Perceptions of Smart Speaker Users
要旨

Intelligent personal assistants (IPA), such as Amazon Alexa and Google Assistant, are becoming ever more present in multi-user households. This leads to questions of privacy and consent, particularly for those who do not directly own the device they interact with. When these devices are placed in shared spaces, every visitor and cohabitant becomes an indirect user, potentially leading to discomfort, misuse of services, or unintentional sharing of personal data. To better understand how owners and visitors perceive IPAs, we interviewed 10 in-house users (account owners and cohabitants) and 9 visitors from a student and young professionals sample who have interacted with such devices on various occasions. We find that cohabitants in shared households with regular IPA interactions see themselves as owners of the device, although not having the same controls as the account owner. Further, we determine the existence of a smart speaker etiquette which doubles as trust-based boundary management. Both in-house users and visitors demonstrate similar attitudes and concerns around data use, constant monitoring by the device, and the lack of transparency around device operations. We discuss interviewees' system understanding, concerns, and protection strategies and make recommendation to avoid tensions around shared devices.

著者
Nicole Meng
University of Edinburgh, Edinburgh, United Kingdom
Dilara Kekulluoglu
University of Edinburgh, Edinburgh, United Kingdom
Kami Vaniea
論文URL

https://doi.org/10.1145/3449119

動画
My Bad! Repairing Intelligent Voice Assistant Errors Improves Interaction
要旨

One key technique people use in conversation and collaboration is conversational repair. Self-repair is the recognition and attempted correction of one's own mistakes. We investigate how the self-repair of errors by intelligent voice assistants affects user interaction. In a controlled human-participant study with N = 101 participants, participants asked Amazon Alexa to perform four tasks, and we manipulated whether Alexa would ``make a mistake'' understanding the participant (for example, playing heavy metal in response to a request for relaxing music) and whether Alexa would perform a correction (for example, stating, ``You don't seem pleased. Did I get that wrong?'') We measured the impact of self-repair on the participant's perception of the interaction in four conditions: correction (mistakes made and repair performed), undercorrection (mistakes made, no repair performed), overcorrection (no mistakes made, but repair performed), and control (no mistakes made, and no repair performed). Subsequently, we conducted free-response interviews with each participant about their interactions. This study finds that self-repair greatly improves people's assessment of an intelligent voice assistant if a mistake has been made, but can degrade assessment if no correction is needed. However, we find that the positive impact of self-repair in the wake of an error outweighs the negative impact of overcorrection. In addition, participants who recently experienced an error saw increased value in self-repair as a feature, regardless of whether they experienced a repair themselves.

著者
Andrea Cuadra
Cornell Tech, New York, New York, United States
Shuran Li
Cornell University, Ithaca, New York, United States
Hansol Lee
Cornell University, Ithaca, New York, United States
Jason Cho
Cornell University, Ithaca, New York, United States
Wendy Ju
Cornell Tech, New York, New York, United States
論文URL

https://doi.org/10.1145/3449101

動画
Let’s Talk It Out: A Chatbot for Effective Study Habit Behavioral Change
要旨

Research has shown study habits and skills to be correlated with academic success, calling for a deeper comprehension of these behaviors and processes to design effective interventions for struggling students. Chatbots have recently been used as a persuasive technology to help support behavioral change, making them an intriguing design space for students' study habits and skills. This paper investigated the feasibility of using chatbots for promoting behavioral change of college students majoring in Computer Science (CS). We conducted semi-structured interviews with CS peer-tutors and surveyed university freshmen to understand issues on students' study habits and identify opportunities for technical intervention. Inspired by the findings, we designed StudyBuddy, a chatbot prototype deployed in Slack, that periodically sends tips, provides assessment of students' study habits via surveys, helps the students break down assignments, recommends academic resources, and sends reminders. We evaluated the usability of the prototype in-depth with 8 students (both first-year and senior students) and 5 course instructors followed by a large scale evaluative survey (n=117) using video of the prototype. Our research identified important design challenges such as building trust and preserving privacy, limiting interaction costs, and supporting both immediate and long-term sustainable support. Likewise, we propose design recommendations that demonstrate context awareness, personalize the experience based on user preferences, and adapt over time as students mature and grow.

著者
Xiaoyi Tian
University of Florida, Gainesville, Florida, United States
Zak Risha
University of Pittsburgh, Pittsburgh, Pennsylvania, United States
Ishrat Ahmed
University of Pittsburgh, Pittsburgh, Pennsylvania, United States
Arun Balajiee Lekshmi Narayanan
University of Pittsburgh, Pittsburgh, Pennsylvania, United States
Jacob Biehl
University of Pittsburgh, Pittsburgh, Pennsylvania, United States
論文URL

https://doi.org/10.1145/3449171

動画
Exploring Interactions Between Trust, Anthropomorphism, and Relationship Development in Voice Assistants
要旨

Modern conversational agents such as Alexa and Google Assistant represent significant progress in speech recognition, natural language processing, and speech synthesis. But as these agents have grown more realistic, concerns have been raised over how their social nature might unconsciously shape our interactions with them. Through a survey of 500 voice assistant users, we explore whether users' relationships with their voice assistants can be quantified using the same metrics as social, interpersonal relationships; as well as if this correlates with how much they trust their devices and the extent to which they anthropomorphise them. Using Knapp's staircase model of human relationships, we find that not only can human-device interactions be modelled in this way, but also that relationship development with voice assistants correlates with increased trust and anthropomorphism.

著者
William Seymour
University of Oxford, Oxford, United Kingdom
Max Van Kleek
University of Oxford, Oxford, Oxfordshire, United Kingdom
論文URL

https://doi.org/10.1145/3479515

動画
Let Me Ask You This: How Can a Voice Assistant Elicit Explicit User Feedback?
要旨

Voice assistants offer users access to an increasing variety of personalized functionalities. The researchers and engineers who build these experiences rely on various signals from users to create the machine learning models powering them. One type of signal is explicit in situ feedback. While collecting explicit in situ user feedback via voice assistants would help improve and inspect the underlying models, from a user perspective it can be disruptive to the overall experience, and the user might not feel compelled to respond. However, careful design can help alleviate friction in the experience. In this paper, we explore the opportunities and the design space for voice assistant feedback elicitation. First, we present four usage categories of explicit in-situ context for model evaluation and improvement, derived from interviews with machine learning practitioners. Then, using realistic scenarios generated for each category and based on examples from the interviews, we conducted an online study to evaluate multiple voice assistant designs. Our results reveal that when the voice assistant is framed as a learner or a collaborator, users were more willing to respond to its request for feedback and felt that the experience was less disruptive. In addition, giving users instructions on how to initiate feedback themselves can reduce the perceived disruptiveness to the experience compared to asking users for feedback directly in the form of a question. Based on our findings, we discuss the implications and potential future directions for designing voice assistants to elicit user feedback for personalized voice experiences.

著者
Ziang Xiao
University of Illinois at Urbana-Champaign, Urbana, Illinois, United States
Sarah Mennicken
Spotify, San Francisco, California, United States
Bernd Huber
Spotify, Boston, Massachusetts, United States
Adam Shonkoff
Spotify, Boston, Massachusetts, United States
Jennifer Thom
Spotify, Boston, Massachusetts, United States
論文URL

https://doi.org/10.1145/3479532

動画