Computational Human-AI Conversation

[A] Paper Room 02, 2021-05-11 17:00:00~2021-05-11 19:00:00 / [B] Paper Room 02, 2021-05-12 01:00:00~2021-05-12 03:00:00 / [C] Paper Room 02, 2021-05-12 09:00:00~2021-05-12 11:00:00

会議の名前
CHI 2021
Shing: A Conversational Agent to Alert Customers of Suspected Online-payment Fraud with Empathetical Communication Skills
要旨

Alerting customers on suspected online-payment fraud and persuade them to terminate transactions is increasingly requested with the rapid growth of digital finance worldwide. We explored the feasibility of using a conversational agent (CA) to fulfill this request. Shing, a voice-based CA, proactively initializes and repairs the conversation with empathetical communication skills in order to alert customers when a suspected online-payment fraud is detected, collects important information for fraud scrutiny and persuades customers to terminate the transaction once the fraud is confirmed. We evaluated our system by comparing it with a rule-based CA with regards to customer response and perceptions in a real-world context where our systems took 144,795 phone calls in total in which 83,019 (57.3%) natural breakdowns happened. Results showed that more customers stopped risky transactions after conversing with Shing. They seemed more willing to converse with Shing for more dialogue turns and provide transaction details. Our work presents practical implications for the design of proactive CA.

著者
Jingya Guo
Alibaba Group, Hang Zhou, Zhejiang, China
Jiajing Guo
Cornell University, Ithaca, New York, United States
Changyuan Yang
Alibaba Group, Hang Zhou, Zhejiang, China
Yanjing Wu
Alibaba Group, Hang Zhou, Zhejiang, China
Wenbo Yang
Alibaba Group, Hang Zhou, Zhejiang, China
Lingyun Sun
Zhejiang University, Hangzhou, China
DOI

10.1145/3411764.3445129

論文URL

https://doi.org/10.1145/3411764.3445129

動画
Towards Mutual Theory of Mind in Human-AI Interaction: How Language Reflects What Students Perceive About a Virtual Teaching Assistant
要旨

Building conversational agents that can conduct natural and prolonged conversations has been a major technical and design challenge, especially for community-facing conversational agents. We posit Mutual Theory of Mind as a theoretical framework to design for natural long-term human-AI interactions. From this perspective, we explore a community's perception of a question-answering conversational agent through self-reported surveys and computational linguistic approach in the context of online education. We first examine long-term temporal changes in students' perception of Jill Watson (JW), a virtual teaching assistant deployed in an online class discussion forum. We then explore the feasibility of inferring students' perceptions of JW through linguistic features extracted from student-JW dialogues. We find that students' perception of JW’s anthropomorphism and intelligence changed significantly over time. Regression analyses reveal that linguistic verbosity, readability, sentiment, diversity, and adaptability reflect student perception of JW. We discuss implications for building adaptive community-facing conversational agents as long-term companions and designing towards Mutual Theory of Mind in human-AI interaction.

著者
Qiaosi Wang
Georgia Institute of Technology, Atlanta, Georgia, United States
Koustuv Saha
Georgia Institute of Technology, Atlanta, Georgia, United States
Eric Gregori
Georgia Tech, Atlanta, Georgia, United States
David Joyner
Georgia Tech, Atlanta, Georgia, United States
Ashok Goel
Georgia Institute of Technology, Atlanta, Georgia, United States
DOI

10.1145/3411764.3445645

論文URL

https://doi.org/10.1145/3411764.3445645

動画
SEMOUR: Scripted EMOtional speech repository for URdu
要旨

Designing reliable Speech Emotion Recognition systems is a complex task that inevitably requires sufficient data for training purposes. Such extensive datasets are currently available in only a few languages, including English, German, and Italian. In this paper, we present SEMOUR, the first scripted database of emotion-tagged speech in the Urdu language, to design an Urdu Speech Recognition System. Our gender-balanced dataset contains 15,040 unique instances recorded by eight professional actors eliciting a syntactically complex script. The dataset is phonetically balanced, and reliably exhibits a varied set of emotions as marked by the high agreement scores among human raters in experiments. We also provide various baseline speech emotion prediction scores on the database, which could be used for various applications like personalized robot assistants, diagnosis of psychological disorders, and getting feedback from a low-tech-enabled population, etc. On a random test sample, our model correctly predicts an emotion with a state-of-the-art 92% accuracy.

著者
Nimra Zaheer
Information Technology University, Lahore, Punjab, Pakistan
Obaid Ullah Ahmad
Information Technology University, Lahore, Punjab, Pakistan
Ammar Ahmed
Information Technology University, Lahore, Pakistan
Muhammad Shehryar Khan
Information Technology University, Lahore, Punjab, Pakistan
Mudassir Shabbir
Information Technology University, Lahore, Punjab, Pakistan
DOI

10.1145/3411764.3445171

論文URL

https://doi.org/10.1145/3411764.3445171

動画
Planning for Natural Language Failures with the AI Playbook
要旨

Prototyping AI user experiences is challenging due in part to probabilistic AI models making it difficult to anticipate, test, and mitigate AI failures before deployment. In this work, we set out to support practitioners with early AI prototyping, with a focus on natural language (NL)-based technologies. Our interviews with 12 NL practitioners from a large technology company revealed that, in addition to challenges prototyping AI, prototyping was often not happening at all or focused only on idealized scenarios due to a lack of tools and tight timelines. These findings informed our design of the AI Playbook, an interactive and low-cost tool we developed to encourage proactive and systematic consideration of AI errors before deployment. Our evaluation of the AI Playbook demonstrates its potential to 1) encourage product teams to prioritize both ideal and failure scenarios, 2) standardize the articulation of AI failures from a user experience perspective, and 3) act as a boundary object between user experience designers, data scientists, and engineers.

著者
Matthew K.. Hong
University of Washington, Seattle, Washington, United States
Adam Fourney
Microsoft, Redmond, Washington, United States
Derek DeBellis
Microsoft, Redmond, Washington, United States
Saleema Amershi
Microsoft, Redmond, Washington, United States
DOI

10.1145/3411764.3445735

論文URL

https://doi.org/10.1145/3411764.3445735

動画
Finding the Needle in a Haystack: On the Automatic Identification of Accessibility User Reviews
要旨

In recent years, mobile accessibility has become an important trend with the goal of allowing all users the possibility of using any app without many limitations. User reviews include insights that are useful for app evolution. However, with the increase in the amount of received reviews, manually analyzing them is tedious and time-consuming, especially when searching for accessibility reviews. The goal of this paper is to support the automated identification of accessibility in user reviews, to help technology professionals in prioritizing their handling, and thus, creating more inclusive apps. Particularly, we design a model that takes as input accessibility user reviews, learns their keyword-based features, in order to make a binary decision, for a given review, on whether it is about accessibility or not. The model is evaluated using a total of 5,326 mobile app reviews. The findings show that (1) our model can accurately identify accessibility reviews, outperforming two baselines, namely keyword-based detector and a random classifier; (2) our model achieves an accuracy of 85% with relatively small training dataset; however, the accuracy improves as we increase the size of the training dataset.

著者
Eman AlOmar
Rochester Institute of Technology, Rochester, New York, United States
Wajdi M. Aljedaani
University of North Texas, Denton, Texas, United States
Murtaza Tamjeed
Rochester Institute of Technology, Rochester, New York, United States
Mohamed Wiem Mkaouer
Rochester Institute of Technology, Rochester, New York, United States
Yasmine N.. Elglaly
Western Washington University, Bellingham, Washington, United States
DOI

10.1145/3411764.3445281

論文URL

https://doi.org/10.1145/3411764.3445281

動画
The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality
要旨

Machine learning classifiers for human-facing tasks such as comment toxicity and misinformation often score highly on metrics such as ROC AUC but are received poorly in practice. Why this gap? Today, metrics such as ROC AUC, precision, and recall are used to measure technical performance; however, human-computer interaction observes that evaluation of human-facing systems should account for people's reactions to the system. In this paper, we introduce a transformation that more closely aligns machine learning classification metrics with the values and methods of user-facing performance measures. The disagreement deconvolution takes in any multi-annotator (e.g., crowdsourced) dataset, disentangles stable opinions from noise by estimating intra-annotator consistency, and compares each test set prediction to the individual stable opinions from each annotator. Applying the disagreement deconvolution to existing social computing datasets, we find that current metrics dramatically overstate the performance of many human-facing machine learning tasks: for example, performance on a comment toxicity task is corrected from .95 to .73 ROC AUC.

著者
Mitchell L. Gordon
Stanford University, Stanford, California, United States
Kaitlyn Zhou
Stanford University, Stanford, California, United States
Kayur Patel
Apple Inc, Seattle, Washington, United States
Tatsunori Hashimoto
Stanford University, Stanford, California, United States
Michael S.. Bernstein
Stanford University, Stanford, California, United States
DOI

10.1145/3411764.3445423

論文URL

https://doi.org/10.1145/3411764.3445423

動画
Designing Effective Interview Chatbots: Automatic Chatbot Profiling and Design Suggestion Generation for Chatbot Debugging
要旨

Recent studies show the effectiveness of interview chatbots in information elicitation. However, designing an effective interview chatbot is non-trivial. Few tools exist to help designers design, evaluate, and improve an interview chatbot iteratively. Based on a formative study and literature reviews, we propose a computational framework for quantifying the performance of interview chatbots. Incorporating the framework, we have developed iChatProfile, an assistive design tool that can automatically generate a profile of an interview chatbot with quantified performance metrics and offer design suggestions for improving the chatbot based on such metrics. To validate the effectiveness of iChatProfile, we designed and conducted a between-subject study that compared the performance of 10 interview chatbots designed with or without using iChatProfile. Based on the live chats between the 10 chatbots and 1349 users, our results show that iChatProfile helped the designers build significantly more effective interview chatbots, improving both interview quality and user experience.

著者
Xu Han
University of Colorado Boulder, Boulder, Colorado, United States
Michelle Zhou
Juji, Inc., San Jose, California, United States
Matthew J. Turner
University of Colorado at Boulder, Boulder, Colorado, United States
Tom Yeh
University of Colorado Boulder, Boulder, Colorado, United States
DOI

10.1145/3411764.3445569

論文URL

https://doi.org/10.1145/3411764.3445569

動画
Soliciting Stakeholders’ Fairness Notions in Child Maltreatment Predictive Systems
要旨

Recent work in fair machine learning has proposed dozens of technical definitions of algorithmic fairness and methods for enforcing these definitions. However, we still lack an understanding of how to develop machine learning systems with fairness criteria that reflect relevant stakeholders' nuanced viewpoints in real-world contexts. To address this gap, we propose a framework for eliciting stakeholders' subjective fairness notions. Combining a user interface that allows stakeholders to examine the data and the algorithm's predictions with an interview protocol to probe stakeholders' thoughts while they are interacting with the interface, we can identify stakeholders' fairness beliefs and principles. We conduct a user study to evaluate our framework in the setting of a child maltreatment predictive system. Our evaluations show that the framework allows stakeholders to comprehensively convey their fairness viewpoints. We also discuss how our results can inform the design of predictive systems.

著者
Hao-Fei Cheng
University of Minnesota, Minneapolis, Minnesota, United States
Logan Stapleton
University of Minnesota, Minneapolis, Minnesota, United States
Ruiqi Wang
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Paige Bullock
Kenyon College, Gambier, Ohio, United States
Alexandra Chouldechova
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Zhiwei Steven Wu
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Haiyi Zhu
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
DOI

10.1145/3411764.3445308

論文URL

https://doi.org/10.1145/3411764.3445308

動画
Crowdsourcing More Effective Initializations for Single-Target Trackers Through Automatic Re-querying
要旨

In single-target video object tracking, an initial bounding box is drawn around a target object and propagated through a video. When this bounding box is provided by a careful human expert, it is expected to yield strong overall tracking performance that can be mimicked at scale by novice crowd workers with the help of advanced quality control methods. However, we show through an investigation of 900 crowdsourced initializations that such quality control strategies are inadequate for this task in two major ways: first, the high level of redundancy in these methods (e.g., averaging multiple responses to reduce error) is unnecessary, as 23\% of crowdsourced initializations perform just as well as the gold-standard initialization. Second, even nearly perfect initializations can lead to degraded long-term performance due to the complexity of object tracking. Considering these findings, we evaluate novel approaches for automatically selecting bounding boxes to re-query, and introduce \textit{Smart Replacement}, an efficient method that decides whether to use the crowdsourced replacement initialization.

著者
Stephan J. Lemmer
University of Michigan, Ann Arbor, Michigan, United States
Jean Y. Song
KAIST, Daejeon, Korea, Republic of
Jason J. Corso
Stevens Institute for Artificial Intelligence, Hoboken, New Jersey, United States
DOI

10.1145/3411764.3445181

論文URL

https://doi.org/10.1145/3411764.3445181

動画
A Human-AI Collaborative Approach for Clinical Decision Making on Rehabilitation Assessment
要旨

Advances in artificial intelligence (AI) have made it increasingly applicable to supplement expert's decision-making in the form of a decision support system on various tasks. For instance, an AI-based system can provide therapists quantitative analysis on patient's status to improve practices of rehabilitation assessment. However, there is limited knowledge on the potential of these systems. In this paper, we present the development and evaluation of an interactive AI-based system that supports collaborative decision making with therapists for rehabilitation assessment. This system automatically identifies salient features of assessment to generate patient-specific analysis for therapists, and tunes with their feedback. In two evaluations with therapists, we found that our system supports therapists significantly higher agreement on assessment (0.71 average F1-score) than a traditional system without analysis (0.66 average F1-score, $p < 0.05$). After tuning with therapist’s feedback, our system significantly improves its performance from 0.8377 to 0.9116 average F1-scores ($p < 0.01$). This work discusses the potential of a human-AI collaborative system to support more accurate decision making while learning from each other's strengths.

著者
Min Hun Lee
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Daniel P. Siewiorek
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Asim Smailagic
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Alexandre Bernardino
Instituto Superior Tecnico, University of Lisbon, Lisbon, Lisbon, Portugal
Sergi Bermúdez i Badia
Universidade da Madeira, Funchal, Portugal
DOI

10.1145/3411764.3445472

論文URL

https://doi.org/10.1145/3411764.3445472

動画
Directed Diversity: Leveraging Language Embedding Distances for Collective Creativity in Crowd Ideation
要旨

Crowdsourcing can collect many diverse ideas by prompting ideators individually, but this can generate redundant ideas. Prior methods reduce redundancy by presenting peers’ ideas or peer-proposed prompts, but these require much human coordination. We introduce Directed Diversity, an automatic prompt selection approach that leverages language model embedding distances to maximize diversity. Ideators can be directed towards diverse prompts and away from prior ideas, thus improving their collective creativity. Since there are diverse metrics of diversity, we present a Diversity Prompting Evaluation Framework consolidating metrics from several research disciplines to analyze along the ideation chain — prompt selection, prompt creativity, prompt-ideation mediation, and ideation creativity. Using this framework, we evaluated Directed Diversity in a series of a simulation study and four user studies for the use case of crowdsourcing motivational messages to encourage physical activity. We show that automated diverse prompting can variously improve collective creativity across many nuanced metrics of diversity.

著者
Samuel Rhys. Cox
National University of Singapore, Singapore, Singapore, Singapore
Yunlong Wang
National University of Singapore, Singapore, Singapore
Ashraf Abdul
National University of Singapore, Singapore, --- Select One ---, Singapore
Christian von der Weth
National University of Singapore, Singapore, Singapore
Brian Y. Lim
National University of Singapore, Singapore, Singapore
DOI

10.1145/3411764.3445782

論文URL

https://doi.org/10.1145/3411764.3445782

動画
Cody: An AI-Based System to Semi-Automate Coding for Qualitative Research
要旨

Qualitative research can produce a rich understanding of a phenomenon but requires an essential and strenuous data annotation process known as coding. Coding can be repetitive and time-consuming, particularly for large datasets. Existing AI-based approaches for partially automating coding, like supervised machine learning (ML) or explicit knowledge represented in code rules, require high technical literacy and lack transparency. Further, little is known about the interaction of researchers with AI-based coding assistance. We introduce Cody, an AI-based system that semi-automates coding through code rules and supervised ML. Cody supports researchers with interactively (re)defining code rules and uses ML to extend coding to unseen data. In two studies with qualitative researchers, we found that (1) code rules provide structure and transparency, (2) explanations are commonly desired but rarely used, (3) suggestions benefit coding quality rather than coding speed, increasing the intercoder reliability, calculated with Krippendorff’s Alpha, from 0.085 (MAXQDA) to 0.33 (Cody).

著者
Tim Rietz
Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Alexander Maedche
Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
DOI

10.1145/3411764.3445591

論文URL

https://doi.org/10.1145/3411764.3445591

動画
Exploring Semi-Supervised Learning for Predicting Listener Backchannels
要旨

Developing human-like conversational agents is a prime area in HCI research and subsumes many tasks. Predicting listener backchannels is one such actively-researched task. While many studies have used different approaches for backchannel prediction, they all have depended on manual annotations for a large dataset. This is a bottleneck impacting the scalability of development. To this end, we propose using semi-supervised techniques to automate the process of identifying backchannels, thereby easing the annotation process. To analyze our identification module's feasibility, we compared the backchannel prediction models trained on (a) manually-annotated and (b) semi-supervised labels. Quantitative analysis revealed that the proposed semi-supervised approach could attain 95% of the former's performance. Our user-study findings revealed that almost 60% of the participants found the backchannel responses predicted by the proposed model more natural. Finally, we also analyzed the impact of personality on the type of backchannel signals and validated our findings in the user-study.

著者
Vidit Jain
Indraprastha Institute of Information Technology (IIIT), Delhi, Delhi, India
Maitree Leekha
DTU, Delhi, Delhi, India
Rajiv Ratn. Shah
IIITD, Delhi, Delhi, India
Jainendra Shukla
Indraprastha Institute of Information Technology Delhi, New Delhi, Delhi, India
DOI

10.1145/3411764.3445449

論文URL

https://doi.org/10.1145/3411764.3445449

動画
Social Sense-making with AI: Designing an Open-ended AI experience with a Blind Child
要旨

AI technologies are often used to aid people in performing discrete tasks with well-defined goals (e.g., recognising faces in images). Emerging technologies that provide continuous, real-time information enable more open-ended AI experiences. In partnership with a blind child, we explore the challenges and opportunities of designing human-AI interaction for a system intended to support social sensemaking. Adopting a research-through-design perspective, we reflect upon working with the uncertain capabilities of AI systems in the design of this experience. We contribute: (i) a concrete example of an open-ended AI system that enabled a blind child to extend his own capabilities; (ii) an illustration of the delta between imagined and actual use, highlighting how capabilities derive from the human-AI interaction and not the AI system alone; and (iii) a discussion of design choices to craft an ongoing human-AI interaction that addresses the challenge of uncertain outputs of AI systems.

著者
Cecily Morrison
Microsoft Research , Cambridge, United Kingdom
Edward Cutrell
Microsoft Research, Redmond, Washington, United States
Martin Grayson
Microsoft Research, Cambridge, United Kingdom
Anja Thieme
Microsoft Research, Cambridge, United Kingdom
Alex S. Taylor
City, London, United Kingdom
Geert Roumen
Microsoft Research, Cambridge, United Kingdom
Camilla Longden
Microsoft Research, Cambridge, United Kingdom
Sebastian Tschiatschek
Microsoft Research, Cambridge, United Kingdom
Rita Faia. Marques
Microsoft Research, Cambridge, United Kingdom
Abigail Sellen
Microsoft Research, Cambridge, United Kingdom
DOI

10.1145/3411764.3445290

論文URL

https://doi.org/10.1145/3411764.3445290

動画