Interactive Visualization for Model Inspection and Debugging

会議の名前
CHI 2026
Evalet: Evaluating Large Language Models through Functional Fragmentation
要旨

Practitioners increasingly rely on Large Language Models (LLMs) to evaluate generative AI outputs through "LLM-as-a-Judge" approaches. However, these methods produce holistic scores that obscure which specific elements influenced the assessments. We propose functional fragmentation, a method that dissects each output into key fragments and interprets the rhetoric functions that each fragment serves relative to evaluation criteria—surfacing the elements of interest and revealing how they fulfill or hinder user goals. We instantiate this approach in Evalet, an interactive system that visualizes fragment-level functions across many outputs to support inspection, rating, and comparison of evaluations. A user study (N=10) found that, while practitioners struggled to validate holistic scores, our approach helped them identify 48% more evaluation misalignments. This helped them calibrate trust in LLM evaluations and rely on them to find more actionable issues in model outputs. Our work shifts LLM evaluation from quantitative scores toward qualitative, fine-grained analysis of model behavior.

受賞
Honorable Mention
著者
Tae Soo Kim
KAIST, Daejeon, Korea, Republic of
Heechan Lee
KAIST, Daejeon, Korea, Republic of
Yoonjoo Lee
KAIST, Daejeon, Korea, Republic of
Joseph Seering
KAIST, Daejeon, Korea, Republic of
Juho Kim
KAIST, Daejeon, Korea, Republic of
SeekUI: Predicting Visual Search Behavior on Graphical User Interfaces with a Reward-Augmented Vision Language Model
要旨

Visual search is key to understanding and improving interaction with graphical user interfaces (GUIs), yet predicting scanpaths on real GUIs remains an open challenge. Unlike free-viewing, visual search is goal-driven and shaped by both linguistic and visual features of the GUI. State-of-the-art models of visual search, trained on natural images, fail with GUIs because they cannot capture the effects of grouping and semantics on search strategies. We present \textsc{SeekUI}, a reward-augmented Vision Language Model (VLM) that predicts scanpaths directly from a GUI screenshot and a text cue describing the desired target. Our model extends the capability of VLMs to reproduce human-like visual search behavior on GUIs and outperforms baseline models across different types of GUIs. Importantly, it reproduces key empirical phenomena established in eye-tracking studies of visual search, including the Guess–Scan–Confirm strategy. In sum, \textsc{SeekUI} provides a foundation for predicting visual search behavior and has potential for informing GUI evaluation and optimization.

著者
Zixin Guo
Aalto University, Espoo, Finland
Yue Jiang
Aalto University, Espoo, Finland
Luis A.. Leiva
University of Luxembourg, Esch-sur-Alzette, Luxembourg
Antti Oulasvirta
Aalto University, Helsinki, Finland
PriorWeaver: Prior Elicitation via Iterative Dataset Construction
要旨

In Bayesian analysis, prior elicitation, or the process of facilitating the expression of one’s beliefs to inform statistical modeling, is an essential yet challenging step. Analysts often have beliefs about real-world variables and their relationships. However, existing tools require analysts to translate these beliefs and express them indirectly as probability distributions over model parameters. We present PriorWeaver, an interactive visualization system that facilitates prior elicitation through iterative dataset construction and refinement. Analysts visually express their assumptions about individual variables and their relationships. Under the hood, these assumptions create a dataset used to derive statistical priors. Prior predictive checks then help analysts compare the priors to their assumptions. In a lab study with 17 participants new to Bayesian analysis, we compare PriorWeaver to a baseline incorporating existing techniques. Compared to the baseline, PriorWeaver gave participants greater control, clarity, and confidence, leading to priors that were better aligned with their expectations.

著者
Yuwei Xiao
UCLA, Los Angeles, California, United States
Shuai Ma
Aalto University, Helsinki, Finland
Antti Oulasvirta
Aalto University, Helsinki, Finland
Eunice Jun
UCLA, Los Angeles, California, United States
ComVi: Context-Aware Optimized Comment Display in Video Playback
要旨

On general video-sharing platforms like YouTube, comments are displayed independently of video playback. As viewers often read comments while watching a video, they may encounter ones referring to moments unrelated to the current scene, which can reveal spoilers and disrupt immersion. To address this problem, we present ComVi, a novel system that displays comments at contextually relevant moments, enabling viewers to see time-synchronized comments and video content together. We first map all comments to relevant video timestamps by computing audio-visual correlation, then construct the comment sequence through an optimization that considers temporal relevance, popularity (number of likes), and display duration for comfortable reading. In a user study, ComVi provided a significantly more engaging experience than conventional video interfaces (i.e., YouTube and Danmaku), with 71.9% of participants selecting ComVi as their most preferred interface.

著者
Minsun Kim
KAIST, Daejeon, Korea, Republic of
Dawon Lee
KAIST, Daejeon, Korea, Republic of
Junyong Noh
KAIST, Daejeon, Korea, Republic of
TSEditor: Interactive Time Series Editing for Privacy Preservation
要旨

Publishing time series datasets raises substantial privacy concerns, as the underlying patterns (e.g., trends, values) can lead to the disclosure of individual identification. Mitigating these concerns remains challenging due to difficulties in pinpointing specific privacy-leaking patterns and protecting them without significantly compromising the analytical utility of the published data. Existing methods remain vulnerable to identity attacks utilizing diverse temporal patterns and may compromise data utility for subsequent analytical tasks. To address these limitations, we collaborated with domain experts to summarize a taxonomy of privacy risks in time series data and developed TSEditor, an interactive editing system. TSEditor integrates coordinated views for multi-perspective analysis of privacy risks and introduces six editing operations for targeted modifications, providing visual feedback. We demonstrate the effectiveness and usability of TSEditor through two case studies, an expert interview, a model evaluation, and a user study.

著者
Zihan Xu
Zhejiang University, Hangzhou, Zhejiang, China
Shuhan Liu
State Key Lab of CAD & CG, Zhejiang University, Hangzhou, Zhejiang, China
Kaicheng Shao
Zhejiang University, Ningbo, Zhejiang, China
Yuanzhe Jin
University of Oxford, Oxford, United Kingdom
Xumeng Wang
Nankai University, Tianjin, China
Zikun Deng
South China University of Technology, Guangzhou, Guangdong, China
Di Weng
Zhejiang University, Ningbo, Zhejiang, China
Yingcai Wu
Zhejiang University, Hangzhou, Zhejiang, China
CADModelScope: Revealing the Dependency Structure Behind Parametric Computer-Aided Design Models
要旨

Parametric computer-aided design (CAD) models are constructed by a sequence of operations, where each operation may reference geometries created by earlier operations. This network of dependencies enables efficient modelling of complex geometry but also results in fragile models, where small modifications can trigger cascading errors. These interdependencies are obscured in commercial CAD systems, leaving users to rely on trial and error when navigating, modularizing, and debugging unfamiliar and complex models. In this paper, we motivate, present, and pilot CADModelScope, a multi-level graph-based visualization of operation dependencies integrated into a commercial CAD platform. In a qualitative lab study, we observed how participants locate and interpret operations, and how CADModelScope enhances awareness of hidden interdependencies and supports more structured navigation. Our findings highlight the potential of using the network of operation dependency as an effective representation for understanding and interacting with parametric CAD models, and we discuss implications for future tool design.

著者
Yuanzhe Deng
University of Toronto, Toronto, Ontario, Canada
Zhijing Zhang
University of Toronto, Toronto, Ontario, Canada
Shurui Zhou
University of Toronto, Toronto, Ontario, Canada
Alison Olechowski
University of Toronto, Toronto, Ontario, Canada
The Way We Notice, That’s What Really Matters: Instantiating UI Components with Distinguishing Variations
要旨

Front-end developers author UI components to be broadly reusable by parameterizing visual and behavioral properties. While flexible, this makes instantiation harder, as developers must reason about numerous property values and interactions. In practice, they must explore the component’s large design space and provide realistic and natural values to properties. To address this, we introduce distinguishing variations: variations that are both mimetic and distinct. We frame distinguishing variation generation as design-space sampling, combining symbolic inference to identify visually important properties with an LLM-driven mimetic sampler to produce realistic instantiations from its world knowledge. We instantiate distinguishing variations in Celestial, a tool that helps developers explore and visualize distinguishing variations. In a study with front-end developers (n=12), participants found these variations useful for comparing and mapping component design spaces, reported that mimetic instantiations were domain-relevant, and validated that Celestial transformed component instantiation from a manual process into a structured, exploratory activity.

著者
Priyan Vaithilingam
Apple, Seattle, Washington, United States
Alan Leung
Apple, Seattle, Washington, United States
Jeffrey Nichols
Apple, Seattle, Washington, United States
Titus Barik
Apple, Seattle, Washington, United States