Revealing the Gap: Visual Comparison of Large-Scale Datasets via Multi-Scale Density Difference Map

要旨

Visual comparison of high-dimensional machine learning datasets helps practitioners identify gaps in data coverage, diagnose distribution shifts, and understand their potential influence on downstream tasks such as classification and object detection. However, the commonly used density map often blurs details and is computationally expensive. We present DiffGrid, a grid-based tool for comparing differences in large datasets. A regularized, grid-based density difference visualization method is developed to enable multi-level analysis of the differences. Interactive zooming and image labels are provided for efficiently exploring differences from overview to detail. We demonstrate the practical value of DiffGrid with two case studies, comparing coresets with full datasets and comparing synthetic infographics with real ones, and validate its effectiveness and usefulness with a quantitative experiment and a user study.

著者
Xinyuan Guo
Tsinghua University, Beijing, China
Xu Zhu
Tsinghua University, Beijing, China
Yilin Ye
Tsinghua University, Beijing, China
Shixia Liu
Tsinghua University, Beijing, China

会議: CHI 2026

ACM CHI Conference on Human Factors in Computing Systems

セッション: Data Visualization Designs and Tools

P1 - Room 117
7 件の発表
2026-04-17 20:15:00
2026-04-17 21:45:00