Cocoa: Co-Planning and Co-Execution with AI Agents
説明

As AI agents take on increasingly long-running tasks involving sophisticated planning and execution, there is a corresponding need for novel interaction designs that enable deeper human-agent collaboration. However, most prior works leverage human interaction to fix "autonomous" workflows that have yet to become fully autonomous or rigidly treat planning and execution as separate stages. Based on a formative study with 9 researchers using AI to support their work, we propose a design that affords greater flexibility in collaboration, so that users can 1) delegate agency to the user or agent via a collaborative plan where individual steps can be assigned; and 2) interleave planning and execution so that plans can adjust after partial execution. We introduce Cocoa, a system that takes design inspiration from computational notebooks to support complex research tasks. A lab study (n=16) found that Cocoa enabled steerability without sacrificing ease-of-use, and a week-long field deployment (n=7) showed how researchers collaborated with Cocoa to accomplish real-world tasks.

日本語まとめ
読み込み中…
読み込み中…
From Overload to Convergence: Supporting Multi-Issue Human–AI Negotiation with Bayesian Visualization
説明

As AI systems increasingly mediate negotiations, understanding how the number of negotiated issues impacts human performance is crucial for maintaining human agency. We designed a human–AI negotiation case study in a realistic property rental scenario, varying the number of negotiated issues; empirical findings show that without support, performance stays stable up to three issues but declines as additional issues increase cognitive load. To address this, we introduce a novel uncertainty-based visualization driven by Bayesian estimation of agreement probability. It shows how the space of mutually acceptable agreements narrows as negotiation progresses, helping users identify promising options. In a within-subjects experiment (N=32), it improved human outcomes and efficiency, preserved human control, and avoided redistributing value. Our findings surface practical limits on the complexity people can manage in human–AI negotiation, advance theory on human performance in complex negotiations, and offer validated design guidance for interactive systems.

日本語まとめ
読み込み中…
読み込み中…
Seeing Eye to Eye: Enabling Cognitive Alignment Through Shared First-Person Perspective in Human–AI Collaboration
説明

Despite advances in multimodal AI, current vision-based assistants often remain inefficient in collaborative tasks. We identify two key gulfs: a communication gulf, where users must translate rich parallel intentions into verbal commands due to the channel mismatch , and an understanding gulf, where AI struggles to interpret subtle embodied cues. To address these, we propose Eye2Eye, a framework that leverages first-person perspective as a channel for human-AI cognitive alignment. It integrates three components: (1) joint attention coordination for fluid focus alignment, (2) revisable memory to maintain evolving common ground, and (3) reflective feedback allowing users to clarify and refine AI's understanding. We implement this framework in an AR prototype and evaluate it through a user study and a post-hoc pipeline evaluation. Results show that Eye2Eye significantly reduces task completion time and interaction load while increasing trust, demonstrating its components work in concert to improve collaboration.

日本語まとめ
読み込み中…
読み込み中…
Can LLM-Simulated Practice and Feedback Upskill Human Counselors? A Randomized Study with 90+ Novice Counselors
説明

The growing demand for accessible mental health support requires training more counselors, yet existing approaches remain resource-intensive and difficult to scale. LLMs can realistically simulate patients and generate actionable feedback for training, but their actual impact on novice counselor skill development remains unknown. We developed an LLM-simulated practice and feedback system and conducted a randomized study with 94 novice counselors, comparing practice alone versus practice with feedback. We evaluated behavioral performance, self-efficacy, and qualitative reflections. Results showed the practice-and-feedback group improved in client-centered microskills (reflections, questions), while the practice-alone group showed no improvements. For empathy, the practice-alone group declined over time and performed significantly worse than the feedback group. Qualitative interviews reinforced these findings: feedback helped participants adopt a client-centered listening approach, while practice-alone participants remained solution-oriented. These results suggest LLM-based training systems can promote effective skill development, and combining simulated practice with structured feedback is critical for meaningful improvement.

日本語まとめ
読み込み中…
読み込み中…
Towards Fluent Interaction with Cyber-Physical Architecture
説明

What happens when your walls begin to move? This paper explores the design of human-robot interaction for architectural-scale, shape-changing environments. We present findings from two studies: (1) a series of speculative design workshops (N=20) that uncovered aspirational visions for these spaces, and (2) a task-based Wizard-of-Oz elicitation study (N=12) that grounded these visions in the challenges of practical interaction. Our workshop findings reveal a complex landscape of user desires, exposing critical tensions between proactive automation and the preservation of user autonomy, and between personalization and public ownership. Our elicitation study reveals a set of core interaction challenges related to multimodal collaboration; and, most critically: suggests the need for a modality-agnostic model of evolving user intent. We conclude with a set of grounded proposals for creating robotic environments that are collaborative and trusted partners in everyday life.

日本語まとめ
読み込み中…
読み込み中…
Results-Actionability Gap: Understanding How Practitioners Evaluate LLM Products in the Wild
説明

How do product teams evaluate LLM-powered products? As organizations integrate large language models (LLMs) into digital products, their unpredictable nature makes traditional evaluation approaches inadequate, yet little is known about how practitioners navigate this challenge. Through interviews with nineteen practitioners across diverse sectors, we identify ten evaluation practices spanning informal 'vibe checks' to organizational meta-work. Beyond confirming four documented challenges, we introduce a novel fifth we call the results-actionability gap, in which practitioners gather evaluation data but cannot translate findings into concrete improvements. Drawing on patterns from successful teams, we contribute strategies to bridge this gap, supporting practitioners' formalization journey from ad-hoc interpretive practices (e.g., vibe checks) toward systematic evaluation. Our analysis suggests these interpretive practices are necessary adaptations to LLM characteristics rather than methodological failures. For HCI researchers, this presents a research opportunity to support practitioners in systematizing emerging practices rather than developing new evaluation frameworks.

日本語まとめ
読み込み中…
読み込み中…
TermSight: Making Service Contracts Approachable
説明

Legal contracts govern much of our society, but their specialized language is difficult for non-experts to read. While AI has enabled simplification of complex language, legal contracts pose unique challenges because of their connection to readers' values, ambiguity, and legally binding nature. Based on a formative study (N=20) using Terms of Service (ToS) as example contracts to study challenges in contract reading, we developed TermSight, an intelligent reading interface to probe the opportunities and challenges of designing augmentations for legal text. TermSight guides readers to relevant clauses with color-coded plain-language snippets of information and contextualizes ambiguous language with definitions and hypothetical scenarios. Importantly, TermSight's features always foreground the original, legally-binding contract text (e.g., linking to associated clauses). Our within-subjects study (N=20) demonstrated the opportunities of TermSight in making ToS significantly easier to read and navigate while revealing the challenges of augmenting service contracts such as ToS.

日本語まとめ
読み込み中…
読み込み中…