Human-in-the-Loop Machine Learning Interfaces

会議の名前
CHI 2026
Through the Lens of Human-Human Collaboration: An Configurable Research Platform for Exploring Human-Agent Collaboration
要旨

Intelligent systems have traditionally been designed as tools rather than collaborators, often lacking critical characteristics that collaboration partnerships require. Recent advances in large language model (LLM) agents open new opportunities for human-LLM-agent collaboration by enabling natural communication and various social and cognitive behaviors. Yet it remains unclear whether principles of computer-mediated collaboration established in HCI and CSCW persist, change, or fail when humans collaborate with LLM agents. To support systematic investigations of these questions, we introduce an open and configurable research platform for HCI researchers. The platform's modular design allows seamless adaptation of classic CSCW experiments and manipulation of theory-grounded interaction controls. We demonstrate the platform's research efficacy and usability through three case studies: (1) two Shape Factory experiments for resource negotiation with 16 participants, (2) one Hidden Profile experiment for information pooling with 16 participants, and (3) a participatory cognitive walkthrough with five HCI researchers to refine workflows of researcher interface for experiment setup and analysis.

著者
Bingsheng Yao
Northeastern University, Boston, Massachusetts, United States
Jiaju Chen
Rice University, Houston, Texas, United States
Chaoran Chen
University of Notre Dame, Notre Dame, Indiana, United States
April Yi. Wang
ETH Zurich, Zurich, Switzerland
Toby Jia-Jun. Li
University of Notre Dame, Notre Dame, Indiana, United States
Dakuo Wang
Northeastern University, Boston, Massachusetts, United States
When Scaffolding Breaks: Investigating Student Interaction with LLM-Based Writing Support in Real-Time K-12 EFL Classrooms
要旨

Large language models (LLMs) are promising tools for scaffolding students' English writing skills, but their effectiveness in real-time K-12 classrooms remains underexplored. Addressing this gap, our study examines the benefits and limitations of using LLMs as real-time learning support, considering how classroom constraints, such as diverse proficiency levels and limited time, affect their effectiveness. We conducted a deployment study with 157 eighth-grade students in a South Korean middle school English class over six weeks. Our findings reveal that while scaffolding improved students' ability to compose grammatically correct sentences, this step-by-step approach demotivated lower-proficiency students and increased their system reliance. We also observed challenges to classroom dynamics, where extroverted students often dominated the teacher's attention, and the system's assistance made it difficult for teachers to identify struggling students. Based on these findings, we discuss design guidelines for integrating LLMs into real-time writing classes as inclusive educational tools.

受賞
Best Paper
著者
Junho Myung
KAIST, Daejeon, Korea, Republic of
Hyunseung Lim
KAIST, Daejeon, Korea, Republic of
Hana Oh
Human Centered Computing Lab, Seoul, Korea, Republic of
Hyoungwook Jin
University of Michigan, Ann Arbor, Michigan, United States
Nayeon Kang
Gyeonggido Office of Education, Suwon, Gyeonggido, Korea, Republic of
So-Yeon Ahn
KAIST, Daejeon, Korea, Republic of
Hwajung Hong
KAIST, Deajeon, Korea, Republic of
Alice Oh
Korea Advanced Institute of Science and Technology, Daejeon, Korea, Republic of
Juho Kim
KAIST, Daejeon, Korea, Republic of
TFTune: Creation and Personalization of Pointing Transfer Functions Using Reinforcement Learning
要旨

Pointing transfer functions define the mapping between input devices and onscreen cursor movement. Despite being used by millions daily, only marginal improvements in pointing performance have been achieved by tuning transfer functions since the introduction of acceleration-based gains. We present TFTune, a reinforcement learning-based approach for improving pointing by automatically tuning personalized transfer functions. We show that TFTune-generated functions outperform operating system defaults, improving movement times by 7% on macOS when using a trackpad (7 minutes of tuning) and 8% on participants' personal Windows computers with hardware (i.e., mice and monitors) of varying characteristics (after just 1 minute of tuning). Further, we show that TFTune generalizes beyond traditional pointing devices, providing 16% improvement for a muscle-computer interface (2 minutes of tuning). TFTune demonstrates an initial approach for scalable and meaningful performance improvements in input–output mappings, opening a new direction for exploring the use of machine learning for improving fundamental computer inputs.

著者
Ethan Eddy
University of New Brunswick, Fredericton, New Brunswick, Canada
Evan Campbell
University of New Brunswick, Fredericton, New Brunswick, Canada
Erik J. Scheme
University of New Brunswick, Fredericton, New Brunswick, Canada
Scott Bateman
University of New Brunswick, Fredericton, New Brunswick, Canada
Géry Casiez
Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRIStAL, Lille, France
DataSpeck: An AI-Driven Human-in-the-Loop System for Automating Transformations in Data Conversion Workflows
要旨

In data-driven systems, integrating disparate data sources becomes challenging when incoming data does not conform to the system's specifications. Despite advances in automated schema matching systems, data integration tasks involving complex semantic interrelationships still require users to manually identify and define transformations between datasets, which can be cognitively demanding and time-consuming. We present DataSpeck, an end-to-end system that automates the conversion of disparate data sources to fit any pre-existing data specification. DataSpeck employs an AI-driven human-in-the-loop design, using LLMs to analyze semantic relationships and generate step-by-step transformation pipelines autonomously, while only requesting user attention to resolve semantic ambiguities. In our technical evaluation, DataSpeck successfully automated ~86% of varied data transformations while generating interpretable strategies with confidence scores and targeted clarification requests. In a user study (N=12), participants completed data conversion tasks ~53% faster with significantly reduced cognitive load using DataSpeck compared to Microsoft Excel with Copilot.

著者
Adil Rahman
University of Virginia, Charlottesville, Virginia, United States
Koichiro Niinuma
Fujitsu Research of America, Pittsburgh, Pennsylvania, United States
Aakar Gupta
Fujitsu Research of America, Redmond, Washington, United States
Prototyping Multimodal GenAI Real-Time Agents with Counterfactual Replays and Hybrid Wizard-of-Oz
要旨

Recent advancements in multimodal generative AI (GenAI) enable the creation of personal context-aware real-time agents that, for example, can augment user workflows by following their on-screen activities and providing contextual assistance. However, prototyping such experiences is challenging, especially when supporting people with domain-specific tasks using real-time inputs such as speech and screen recordings. While prototyping an LLM-based proactive support agent system, we found that existing prototyping and evaluation methods were insufficient to anticipate the nuanced situational complexity and contextual immediacy required. To overcome these challenges, we explored a novel user-centered prototyping approach that combines counterfactual video replay prompting and hybrid Wizard of Oz methods to iteratively design and refine agent behaviors. This paper discusses our prototyping experiences, highlighting successes and limitations, and offers a practical guide and an open-source toolkit for UX designers, HCI researchers, and AI toolmakers to build more user-centered and context-aware multimodal agents.

著者
Frederic Gmeiner
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Kenneth Holstein
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Nikolas Martelaro
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Data-Prompt Co-Evolution: Growing Test Sets to Refine LLM Behavior
要旨

Large Language Models (LLMs) are increasingly embedded in applications, and people can shape model behavior by editing prompt instructions. Yet encoding subtle, domain-specific policies into prompts is challenging. Although this process often benefits from concrete test cases, test data and prompt instructions are typically developed as separate artifacts, reflecting traditional machine learning practices in which model tuning was slow and test sets were static. We argue that the fast, iterative nature of prompt engineering calls for removing this separation and enabling a new workflow: data-prompt co-evolution, where a living test set and prompt instructions evolve in tandem. We present an interactive system that operationalizes this workflow. It guides application developers to discover edge cases, articulate rationales for desired behavior, and iteratively evaluate revised prompts against a growing test set. A user study shows our workflow helps people refine prompts systematically, better aligning them with their intended policies. This work points toward more robust and responsible LLM applications through human-in-the-loop development.

著者
Minjae Lee
Yonsei University, Seoul, Korea, Republic of
Minsuk Kahng
Yonsei University, Seoul, Korea, Republic of
ClearFairy: Capturing Creative Workflows through Decision Structuring, In-Situ Questioning, and Rationale Inference
要旨

Capturing professionals’ decision-making in creative workflows (e.g., UI/UX) is essential for reflection, collaboration, and knowledge sharing, yet existing methods often leave rationales incomplete and implicit decisions hidden. To address this, we present the CLEAR approach, which structures reasoning into cognitive decision steps—linked units of actions, artifacts, and explanations, making decisions traceable with generative AI. Building on CLEAR, we introduce ClearFairy, a think-aloud AI assistant for UI design that detects weak explanations, asks lightweight clarifying questions, and infers missing rationales. In a study with twelve professionals, 85% of ClearFairy’s inferred rationales were accepted (as-is or with revisions). Notably, the system increased "strong explanations"'—rationales providing sufficient causal reasoning—from 14% to 83% without adding cognitive demand. Furthermore, exploratory applications demonstrate that captured steps can enhance generative AI agents in Figma, yielding predictions better aligned with professionals and producing coherent outcomes. We release a dataset of 417 decision steps to support future research.

著者
Kihoon Son
KAIST, Daejeon, Korea, Republic of
DaEun Choi
KAIST, Daejeon, Korea, Republic of
Tae Soo Kim
KAIST, Daejeon, Korea, Republic of
Young-Ho Kim
NAVER AI Lab, Seongnam, Korea, Republic of
Sangdoo Yun
NAVER AI Lab, Seongnam, Gyeonggi, Korea, Republic of
Juho Kim
KAIST, Daejeon, Korea, Republic of