Three of the most common approaches used in recommender systems are content-based fltering (matching users’ preferences with products’ characteristics), collaborative fltering (matching users with similar preferences), and demographic fltering (catering to users based on demographic characteristics). Do users’ intuitions lead them to trust one of these approaches over others, independent of the actual operations of these diferent systems? Does their faith in one type or another depend on the quality of the recommendation, rather than how the recommendation appears to have been derived? We conducted an empirical study with a prototype of a movie recommender system to fnd out. A 3 (Ostensible Recommender Type: Content vs. Collaborative vs. Demographic Filtering) x 2 (Recommendation Quality: Good vs. Bad) experiment (N=226) investigated how users evaluate systems and attribute responsibility for the recommendations they receive. We found that users trust systems that use collaborative fltering more, regardless of the system’s performance. They think that they themselves are responsible for good recommendations but that the system is responsible for bad recommendations (refecting a self-serving bias). Theoretical insights, design implications and practical solutions for the cold start problem are discussed.
https://dl.acm.org/doi/abs/10.1145/3491102.3501936
Model-agnostic explainable AI tools explain their predictions by means of ’local’ feature contributions. We empirically investigate two potential improvements over current approaches. The first one is to always present feature contributions in terms of the contribution to the outcome that is perceived as positive by the user (“positive framing”). The second one is to add “semantic labeling”, that explains the directionality of each feature contribution (“this feature leads to +5% eligibility”), reducing additional cognitive processing steps. In a user study, participants evaluated the understandability of explanations for different framing and labeling conditions for loan applications and music recommendations. We found that positive framing improves understandability even when the prediction is negative. Additionally, adding semantic labels eliminates any framing effects on understandability, with positive labels outperforming negative labels. We implemented our suggestions in a package ArgueView.
https://dl.acm.org/doi/abs/10.1145/3491102.3517650
The popularity of task-oriented chatbots is constantly growing, but smooth conversational progress with them remains profoundly challenging. In recent years, researchers have argued that chatbot systems should include guidance for users on how to converse with them. Nevertheless, empirical evidence about what to place in such guidance, and when to deliver it, has been lacking. Using a mixed-methods approach that integrates results from a between-subjects experiment and a reflection session, this paper compares the ef- fectiveness of eight combinations of two guidance types (example-based and rule-based) at four guidance timings (service-onboarding, task-intro, after-failure, and upon-request), as measured by users’ task performance, improvement on subsequent tasks, and subjec- tive experience. It establishes that each guidance type and timing has particular strengths and weaknesses, thus that each type/timing combination has a unique impact on performance metrics, learning outcomes, and user experience. On that basis, it presents guidance-design recommendations for future task-oriented chatbots.
https://dl.acm.org/doi/abs/10.1145/3491102.3501941
Conversational recommender systems (CRSs) imitate human advisors to assist users in finding items through conversations and have recently gained increasing attention in domains such as media and e-commerce. Like in human communication, building trust in human-agent communication is essential given its significant influence on user behavior. However, inspiring user trust in CRSs with a “one-size-fits-all” design is difficult, as individual users may have their own expectations for conversational interactions (e.g., who, user or system, takes the initiative), which are potentially related to their personal characteristics. In this study, we investigated the impacts of three personal characteristics, namely personality traits, trust propensity, and domain knowledge, on user trust in two types of text-based CRSs, i.e., user-initiative and mixed-initiative. Our between-subjects user study (N=148) revealed that users’ trust propensity and domain knowledge positively influenced their trust in CRSs, and that users with high conscientiousness tended to trust the mixed-initiative system.
https://dl.acm.org/doi/abs/10.1145/3491102.3517471
Despite impressive performance in many benchmark datasets, AI models can still make mistakes, especially among out-of-distribution examples. It remains an open question how such imperfect models can be used effectively in collaboration with humans. Prior work has focused on AI assistance that helps people make individual high-stakes decisions, which is not scalable for a large amount of relatively low-stakes decisions, e.g., moderating social media comments. Instead, we propose conditional delegation as an alternative paradigm for human-AI collaboration where humans create rules to indicate trustworthy regions of a model. Using content moderation as a testbed, we develop novel interfaces to assist humans in creating conditional delegation rules and conduct a randomized experiment with two datasets to simulate in-distribution and out-of-distribution scenarios. Our study demonstrates the promise of conditional delegation in improving model performance and provides insights into design for this novel paradigm, including the effect of AI explanations.
https://dl.acm.org/doi/abs/10.1145/3491102.3501999