DataSpeck: An AI-Driven Human-in-the-Loop System for Automating Transformations in Data Conversion Workflows

要旨

In data-driven systems, integrating disparate data sources becomes challenging when incoming data does not conform to the system's specifications. Despite advances in automated schema matching systems, data integration tasks involving complex semantic interrelationships still require users to manually identify and define transformations between datasets, which can be cognitively demanding and time-consuming. We present DataSpeck, an end-to-end system that automates the conversion of disparate data sources to fit any pre-existing data specification. DataSpeck employs an AI-driven human-in-the-loop design, using LLMs to analyze semantic relationships and generate step-by-step transformation pipelines autonomously, while only requesting user attention to resolve semantic ambiguities. In our technical evaluation, DataSpeck successfully automated ~86% of varied data transformations while generating interpretable strategies with confidence scores and targeted clarification requests. In a user study (N=12), participants completed data conversion tasks ~53% faster with significantly reduced cognitive load using DataSpeck compared to Microsoft Excel with Copilot.

著者
Adil Rahman
University of Virginia, Charlottesville, Virginia, United States
Koichiro Niinuma
Fujitsu Research of America, Pittsburgh, Pennsylvania, United States
Aakar Gupta
Fujitsu Research of America, Redmond, Washington, United States

会議: CHI 2026

ACM CHI Conference on Human Factors in Computing Systems

セッション: Human-in-the-Loop Machine Learning Interfaces

P1 - Room 111
7 件の発表
2026-04-17 18:00:00
2026-04-17 19:30:00