Voice & speech interaction

Paper session

会議の名前
CHI 2020
"Hi! I am the Crowd Tasker" Crowdsourcing through Digital Voice Assistants
要旨

Inspired by the increasing prevalence of digital voice assistants, we demonstrate the feasibility of using voice interfaces to deploy and complete crowd tasks. We have developed Crowd Tasker, a novel system that delivers crowd tasks through a digital voice assistant. In a lab study, we validate our proof-of-concept and show that crowd task performance through a voice assistant is comparable to that of a web interface for voice-compatible and voice-based crowd tasks for native English speakers. We also report on a field study where participants used our system in their homes. We find that crowdsourcing through voice can provide greater flexibility to crowd workers by allowing them to work in brief sessions, enabling multi-tasking, and reducing the time and effort required to initiate tasks. We conclude by proposing a set of design guidelines for the creation of crowd tasks for voice and the development of future voice-based crowdsourcing systems.

キーワード
Crowdsourcing
Smart Speakers
Digital Voice Assistants
Voice User Interface
著者
Danula Hettiachchi
University of Melbourne, Melbourne, VIC, Australia
Zhanna Sarsenbayeva
University of Melbourne, Melbourne, VIC, Australia
Fraser Allison
University of Melbourne, Parkville, Australia
Niels van Berkel
University College London, London, United Kingdom
Tilman Dingler
University of Melbourne, Melbourne, VIC, Australia
Gabriele Marini
University of Melbourne, Melbourne, VIC, Australia
Vassilis Kostakos
University of Melbourne, Melbourne, VIC, Australia
Jorge Goncalves
University of Melbourne, Melbourne, VIC, Australia
DOI

10.1145/3313831.3376320

論文URL

https://doi.org/10.1145/3313831.3376320

Designing Voice Interfaces: Back to the (Curriculum) Basics
要旨

Voice user interfaces (VUIs) are rapidly increasing in popularity in the consumer space. This leads to a concurrent explosion of available applications for such devices, with many industries rushing to offer voice interactions for their products. This pressure is then transferred to interface designers; however, a large majority of designers have been only trained to handle the usability challenges specific to Graphical User Interfaces (GUIs). Since VUIs differ significantly in design and usability from GUIs, we investigate in this paper the extent to which current educational resources prepare designers to handle the specific challenges of VUI design. For this, we conducted a preliminary scoping scan and syllabi meta review of HCI curricula at more than twenty top international HCI departments, revealing that the current offering of VUI design training within HCI education is rather limited. Based on this, we advocate for the updating of HCI curricula to incorporate VUI design, and for the development of VUI-specific pedagogical artifacts to be included in new curricula.

キーワード
Voice user interface
Conversational interface
Speech
VUI Design
HCI Education
HCI Curriculum
著者
Christine Murad
University of Toronto, Toronto, ON, Canada
Cosmin Munteanu
University of Toronto Mississauga, Mississauga, ON, Canada
DOI

10.1145/3313831.3376522

論文URL

https://doi.org/10.1145/3313831.3376522

Choice of Voices: A Large-Scale Evaluation of Text-to-Speech Voice Quality for Long-Form Content
要旨

The advancement of text-to-speech (TTS) voices and a rise of commercial TTS platforms allow people to easily experience TTS voices across a variety of technologies, applications, and form factors. As such, we evaluated TTS voices for long-form content: not individual words or sentences, but voices that are pleasant to listen to for several minutes at a time. We introduce a method using a crowdsourcing platform and an online survey to evaluate voices based on listening experience, perception of clarity and quality, and comprehension. We evaluated 18 TTS voices, three human voices, and a text-only control condition. We found that TTS voices are close to rivaling human voices, yet no single voice outperforms the others across all evaluation dimensions. We conclude with considerations for selecting text-to-speech voices for long-form content.

キーワード
voice quality
text-to-speech
TTS
voice interface
synthesized speech
long-form
listening experience
著者
Julia Cambre
Carnegie Mellon University, Pittsburgh, PA, USA
Jessica Colnago
Carnegie Mellon University, Pittsburgh, PA, USA
Jim Maddock
Northwestern University, Evanston, IL, USA
Janice Tsai
Mozilla Corporation, Mountain View, CA, USA
Jofish Kaye
Mozilla Corporation, Mountain View, CA, USA
DOI

10.1145/3313831.3376789

論文URL

https://doi.org/10.1145/3313831.3376789

Developing a Personality Model for Speech-based Conversational Agents Using the Psycholexical Approach
要旨

We present the first systematic analysis of personality dimensions developed specifically to describe the personality of speech-based conversational agents. Following the psycholexical approach from psychology, we first report on a new multi-method approach to collect potentially descriptive adjectives from 1) a free description task in an online survey (228 unique descriptors), 2) an interaction task in the lab (176 unique descriptors), and 3) a text analysis of 30,000 online reviews of conversational agents (Alexa, Google Assistant, Cortana) (383 unique descriptors). We aggregate the results into a set of 349 adjectives, which are then rated by 744 people in an online survey. A factor analysis reveals that the commonly used Big Five model for human personality does not adequately describe agent personality. As an initial step to developing a personality model, we propose alternative dimensions and discuss implications for the design of agent personalities, personality-aware personalisation, and future research.

受賞
Honorable Mention
キーワード
Big 5
Conversational agents
Personality
著者
Sarah Theres Völkel
Ludwig Maximilian University of Munich, Munich, Germany
Ramona Schödel
Ludwig Maximilian University of Munich, Munich, Germany
Daniel Buschek
University of Bayreuth, Bayreuth, Germany
Clemens Stachl
Stanford University, Palo Alto, CA, USA
Verena Winterhalter
Ludwig Maximilian University of Munich, Munich, Germany
Markus Bühner
Ludwig Maximilian University of Munich, Munich, Germany
Heinrich Hussmann
Ludwig Maximilian University of Munich, Munich, Germany
DOI

10.1145/3313831.3376210

論文URL

https://doi.org/10.1145/3313831.3376210

An Honest Conversation: Transparently Combining Machine and Human Speech Assistance in Public Spaces
要旨

There is widespread concern over the ways speech assistant providers currently use humans to listen to users' queries without their knowledge. We report two iterations of the TalkBack smart speaker, which transparently combines machine and human assistance. In the first, we created a prototype to investigate whether people would choose to forward their questions to a human answerer if the machine was unable to help. Longitudinal deployment revealed that most users would do so when given the explicit choice. In the second iteration we extended the prototype to draw upon spoken answers from previous deployments, combining machine efficiency with human richness. Deployment of this second iteration shows that this corpus can help provide relevant, human-created instant responses. We distil lessons learned for those developing conversational agents or other AI-infused systems about how to appropriately enlist human-in-the-loop information services to benefit users, task workers and system performance.

キーワード
conversational agents
speech appliances
public space interaction
emergent users
著者
Thomas Reitmaier
Swansea University, Swansea, United Kingdom
Simon Robinson
Swansea University, Swansea, United Kingdom
Jennifer Pearson
Swansea University, Swansea, United Kingdom
Dani Kalarikalayil Raju
Studio Hasi, Mumbai, India
Matt Jones
Swansea University, Swansea, United Kingdom
DOI

10.1145/3313831.3376310

論文URL

https://doi.org/10.1145/3313831.3376310