Human-in-the-Loop Machine Learning Interfaces

Paper Guilds

/

会議の一覧

/

CHI 2026

/

Human-in-the-Loop Machine Learning Interfaces

CHI 2026

Extended Reality & Immersive Systems II

Inferring Human State

Intelligent systems have traditionally been designed as tools rather than collaborators, often lacking critical characteristics that collaboration partnerships require. Recent advances in large language model (LLM) agents open new opportunities for human-LLM-agent collaboration by enabling natural communication and various social and cognitive behaviors. Yet it remains unclear whether principles of computer-mediated collaboration established in HCI and CSCW persist, change, or fail when humans collaborate with LLM agents. To support systematic investigations of these questions, we introduce an open and configurable research platform for HCI researchers. The platform's modular design allows seamless adaptation of classic CSCW experiments and manipulation of theory-grounded interaction controls. We demonstrate the platform's research efficacy and usability through three case studies: (1) two Shape Factory experiments for resource negotiation with 16 participants, (2) one Hidden Profile experiment for information pooling with 16 participants, and (3) a participatory cognitive walkthrough with five HCI researchers to refine workflows of researcher interface for experiment setup and analysis.

Northeastern University, Boston, Massachusetts, United States

Rice University, Houston, Texas, United States

University of Notre Dame, Notre Dame, Indiana, United States

ETH Zurich, Zurich, Switzerland

University of Notre Dame, Notre Dame, Indiana, United States

Northeastern University, Boston, Massachusetts, United States

お気に入り

あとで読む

コレクション

Large language models (LLMs) are promising tools for scaffolding students' English writing skills, but their effectiveness in real-time K-12 classrooms remains underexplored. Addressing this gap, our study examines the benefits and limitations of using LLMs as real-time learning support, considering how classroom constraints, such as diverse proficiency levels and limited time, affect their effectiveness. We conducted a deployment study with 157 eighth-grade students in a South Korean middle school English class over six weeks. Our findings reveal that while scaffolding improved students' ability to compose grammatically correct sentences, this step-by-step approach demotivated lower-proficiency students and increased their system reliance. We also observed challenges to classroom dynamics, where extroverted students often dominated the teacher's attention, and the system's assistance made it difficult for teachers to identify struggling students. Based on these findings, we discuss design guidelines for integrating LLMs into real-time writing classes as inclusive educational tools.

KAIST, Daejeon, Korea, Republic of

Human Centered Computing Lab, Seoul, Korea, Republic of

University of Michigan, Ann Arbor, Michigan, United States

Gyeonggido Office of Education, Suwon, Gyeonggido, Korea, Republic of

KAIST, Daejeon, Korea, Republic of

KAIST, Deajeon, Korea, Republic of

Korea Advanced Institute of Science and Technology, Daejeon, Korea, Republic of

KAIST, Daejeon, Korea, Republic of

お気に入り

あとで読む

コレクション

Pointing transfer functions define the mapping between input devices and onscreen cursor movement. Despite being used by millions daily, only marginal improvements in pointing performance have been achieved by tuning transfer functions since the introduction of acceleration-based gains. We present TFTune, a reinforcement learning-based approach for improving pointing by automatically tuning personalized transfer functions. We show that TFTune-generated functions outperform operating system defaults, improving movement times by 7% on macOS when using a trackpad (7 minutes of tuning) and 8% on participants' personal Windows computers with hardware (i.e., mice and monitors) of varying characteristics (after just 1 minute of tuning). Further, we show that TFTune generalizes beyond traditional pointing devices, providing 16% improvement for a muscle-computer interface (2 minutes of tuning). TFTune demonstrates an initial approach for scalable and meaningful performance improvements in input–output mappings, opening a new direction for exploring the use of machine learning for improving fundamental computer inputs.

University of New Brunswick, Fredericton, New Brunswick, Canada

Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRIStAL, Lille, France

お気に入り

あとで読む

コレクション

In data-driven systems, integrating disparate data sources becomes challenging when incoming data does not conform to the system's specifications. Despite advances in automated schema matching systems, data integration tasks involving complex semantic interrelationships still require users to manually identify and define transformations between datasets, which can be cognitively demanding and time-consuming. We present DataSpeck, an end-to-end system that automates the conversion of disparate data sources to fit any pre-existing data specification. DataSpeck employs an AI-driven human-in-the-loop design, using LLMs to analyze semantic relationships and generate step-by-step transformation pipelines autonomously, while only requesting user attention to resolve semantic ambiguities. In our technical evaluation, DataSpeck successfully automated ~86% of varied data transformations while generating interpretable strategies with confidence scores and targeted clarification requests. In a user study (N=12), participants completed data conversion tasks ~53% faster with significantly reduced cognitive load using DataSpeck compared to Microsoft Excel with Copilot.

University of Virginia, Charlottesville, Virginia, United States

Fujitsu Research of America, Pittsburgh, Pennsylvania, United States

Fujitsu Research of America, Redmond, Washington, United States

お気に入り

あとで読む

コレクション

Recent advancements in multimodal generative AI (GenAI) enable the creation of personal context-aware real-time agents that, for example, can augment user workflows by following their on-screen activities and providing contextual assistance. However, prototyping such experiences is challenging, especially when supporting people with domain-specific tasks using real-time inputs such as speech and screen recordings. While prototyping an LLM-based proactive support agent system, we found that existing prototyping and evaluation methods were insufficient to anticipate the nuanced situational complexity and contextual immediacy required. To overcome these challenges, we explored a novel user-centered prototyping approach that combines counterfactual video replay prompting and hybrid Wizard of Oz methods to iteratively design and refine agent behaviors. This paper discusses our prototyping experiences, highlighting successes and limitations, and offers a practical guide and an open-source toolkit for UX designers, HCI researchers, and AI toolmakers to build more user-centered and context-aware multimodal agents.

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

お気に入り

あとで読む

コレクション

Large Language Models (LLMs) are increasingly embedded in applications, and people can shape model behavior by editing prompt instructions. Yet encoding subtle, domain-specific policies into prompts is challenging. Although this process often benefits from concrete test cases, test data and prompt instructions are typically developed as separate artifacts, reflecting traditional machine learning practices in which model tuning was slow and test sets were static. We argue that the fast, iterative nature of prompt engineering calls for removing this separation and enabling a new workflow: data-prompt co-evolution, where a living test set and prompt instructions evolve in tandem. We present an interactive system that operationalizes this workflow. It guides application developers to discover edge cases, articulate rationales for desired behavior, and iteratively evaluate revised prompts against a growing test set. A user study shows our workflow helps people refine prompts systematically, better aligning them with their intended policies. This work points toward more robust and responsible LLM applications through human-in-the-loop development.

Yonsei University, Seoul, Korea, Republic of

お気に入り

あとで読む

コレクション

Capturing professionals’ decision-making in creative workflows (e.g., UI/UX) is essential for reflection, collaboration, and knowledge sharing, yet existing methods often leave rationales incomplete and implicit decisions hidden. To address this, we present the CLEAR approach, which structures reasoning into cognitive decision steps—linked units of actions, artifacts, and explanations, making decisions traceable with generative AI. Building on CLEAR, we introduce ClearFairy, a think-aloud AI assistant for UI design that detects weak explanations, asks lightweight clarifying questions, and infers missing rationales. In a study with twelve professionals, 85% of ClearFairy’s inferred rationales were accepted (as-is or with revisions). Notably, the system increased "strong explanations"'—rationales providing sufficient causal reasoning—from 14% to 83% without adding cognitive demand. Furthermore, exploratory applications demonstrate that captured steps can enhance generative AI agents in Figma, yielding predictions better aligned with professionals and producing coherent outcomes. We release a dataset of 417 decision steps to support future research.

KAIST, Daejeon, Korea, Republic of

NAVER AI Lab, Seongnam, Korea, Republic of

NAVER AI Lab, Seongnam, Gyeonggi, Korea, Republic of

KAIST, Daejeon, Korea, Republic of

お気に入り

あとで読む

コレクション

Extended Reality & Immersive Systems II

Inferring Human State

要旨

著者

要旨

受賞
Best Paper

著者

要旨

著者

要旨

著者

要旨

著者

要旨

著者

要旨

著者

Human-in-the-Loop Machine Learning Interfaces

要旨

著者

要旨

受賞Best Paper

著者

要旨

著者

要旨

著者

要旨

著者

要旨

著者

要旨

著者

受賞
Best Paper