Visual comparison of high-dimensional machine learning datasets helps practitioners identify gaps in data coverage, diagnose distribution shifts, and understand their potential influence on downstream tasks such as classification and object detection. However, the commonly used density map often blurs details and is computationally expensive. We present DiffGrid, a grid-based tool for comparing differences in large datasets. A regularized, grid-based density difference visualization method is developed to enable multi-level analysis of the differences. Interactive zooming and image labels are provided for efficiently exploring differences from overview to detail. We demonstrate the practical value of DiffGrid with two case studies, comparing coresets with full datasets and comparing synthetic infographics with real ones, and validate its effectiveness and usefulness with a quantitative experiment and a user study.
ACM CHI Conference on Human Factors in Computing Systems