Despite the common use of rule-based tools for online content moderation, human moderators still spend a lot of time monitoring them to ensure they work as intended. Based on surveys and interviews with Reddit moderators who use AutoModerator, we identified the main challenges in reducing false positives and false negatives of automated rules: not being able to estimate the actual effect of a rule in advance and having difficulty figuring out how the rules should be updated. To address these issues, we built ModSandbox, a novel virtual sandbox system that detects possible false positives and false negatives of a rule and visualizes which part of the rule is causing issues. We conducted a comparative, between-subject study with online content moderators to evaluate the effect of ModSandbox in improving automated rules. Results show that ModSandbox can support quickly finding possible false positives and false negatives of automated rules and guide moderators to improve them to reduce future errors.
https://doi.org/10.1145/3544548.3581057
Traditionally, writing assistance systems have focused on short or even single-word suggestions. Recently, large language models like GPT-3 have made it possible to generate significantly longer natural-sounding suggestions, offering more advanced assistance opportunities. This study explores the trade-offs between sentence- vs. message-level suggestions for AI-mediated communication. We recruited 120 participants to act as staffers from legislators' offices who often need to respond to large volumes of constituent concerns. Participants were asked to reply to emails with different types of assistance. The results show that participants receiving message-level suggestions responded faster and were more satisfied with the experience, as they mainly edited the suggested drafts. In addition, the texts they wrote were evaluated as more helpful by others. In comparison, participants receiving sentence-level assistance retained a higher sense of agency, but took longer for the task as they needed to plan the flow of their responses and decide when to use suggestions. Our findings have implications for designing task-appropriate communication assistance systems.
https://doi.org/10.1145/3544548.3581351
Video conferencing solutions like Zoom, Google Meet, and Microsoft Teams are becoming increasingly popular for facilitating conversations, and recent advancements such as live captioning help people better understand each other. We believe that the addition of visuals based on the context of conversations could further improve comprehension of complex or unfamiliar concepts. To explore the potential of such capabilities, we conducted a formative study through remote interviews (N=10) and crowdsourced a dataset of over 1500 sentence-visual pairs across a wide range of contexts. These insights informed Visual Captions, a real-time system that integrates with a videoconferencing platform to enrich verbal communication. Visual Captions leverages a fine-tuned large language model to proactively suggest relevant visuals in open-vocabulary conversations. We present the findings from a lab study (N=26) and an in-the-wild case study (N=10), demonstrating how Visual Captions can help improve communication through visual augmentation in various scenarios.
https://doi.org/10.1145/3544548.3581566
This work aims to explore how human assessments and AI predictions can be combined to identify misinformation on social media. To do so, we design a personalized AI which iteratively takes as training data a single user's assessment of content and predicts how the same user would assess other content. We conduct a user study in which participants interact with a personalized AI that learns their assessments of a feed of tweets, shows its predictions of whether a user would find other tweets (in)accurate, and evolves according to the user feedback. We study how users perceive such an AI, and whether the AI predictions influence users’ judgment. We find that this influence does exist and it grows larger over time, but it is reduced when users provide reasoning for their assessment. We draw from our empirical observations to identify design implications and directions for future work.
https://doi.org/10.1145/3544548.3581219
Modern news aggregators do the hard work of organizing a large news stream, creating collections for a given news story with tens of source options. This paper shows that navigating large source collections for a news story can be challenging without further guidance. In this work, we design three interfaces -- the Annotated Article, the Recomposed Article, and the Question Grid -- aimed at accompanying news readers in discovering coverage diversity while they read. A first usability study with 10 journalism experts confirms the designed interfaces all reveal coverage diversity and determine each interface's potential use cases and audiences. In a second usability study, we developed and implemented a reading exercise with 95 novice news readers to measure exposure to coverage diversity. Results show that Annotated Article users are able to answer questions 34% more completely than with two existing interfaces while finding the interface equally easy to use.
Longform spoken dialog delivers rich streams of informative content through podcasts, interviews, debates, and meetings. While production of this medium has grown tremendously, spoken dialog remains challenging to consume as listening is slower than reading and difficult to skim or navigate relative to text. Recent systems leveraging automatic speech recognition (ASR) and automatic summarization allow users to better browse speech data and forage for information of interest. However, these systems intake disfluent speech which causes automatic summarization to yield readability, adequacy, and accuracy problems. To improve navigability and browsability of speech, we present three training agnostic post-processing techniques that address dialog concerns of readability, coherence, and adequacy. We integrate these improvements with user interfaces which communicate estimated summary metrics to aid user browsing heuristics. Quantitative evaluation metrics show a 19\% improvement in summary quality. We discuss how summarization technologies can help people browse longform audio in trustworthy and readable ways.
https://doi.org/10.1145/3544548.3581339