Exploring Empty Spaces: Human-in-the-Loop Data Augmentation

要旨

Data augmentation is crucial to make machine learning models more robust and safe. However, augmenting data can be challenging as it requires generating diverse data points to rigorously evaluate model behavior on edge cases and mitigate potential harms. Creating high-quality augmentations that cover these "unknown unknowns" is a time- and creativity-intensive task. In this work, we introduce Amplio, an interactive tool to help practitioners navigate "unknown unknowns" in unstructured text datasets and improve data diversity by systematically identifying empty data spaces to explore. Amplio includes three human-in-the-loop data augmentation techniques: Augment with Concepts, Augment by Interpolation, and Augment with Large Language Model. In a user study with 18 professional red teamers, we demonstrate the utility of our augmentation methods in helping generate high-quality, diverse, and relevant model safety prompts. We find that Amplio enabled red teamers to augment data quickly and creatively, highlighting the transformative potential of interactive augmentation workflows.

受賞
Honorable Mention
著者
Catherine Yeh
Harvard University, Boston, Massachusetts, United States
Donghao Ren
Apple, Seattle, Washington, United States
Yannick Assogba
Apple, Cambridge, Massachusetts, United States
Dominik Moritz
Apple, Pittsburgh, Pennsylvania, United States
Fred Hohman
Apple, Seattle, Washington, United States
DOI

10.1145/3706598.3713491

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713491

動画

会議: CHI 2025

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)

セッション: Playing with Data

Annex Hall F206
7 件の発表
2025-04-30 20:10:00
2025-04-30 21:40:00
日本語まとめ
読み込み中…