AI & Data Visualization

Large Language Models (LLMs) are transforming Conversational Visual Analytics (CVA) by enabling data analysis through natural language. However, evaluating LLMs for CVA remains a challenge: requiring programming expertise, overlooking real-world complexity, and lacking interpretable metrics for multi-format (visualizations and text) outputs. Through interviews with 22 CVA developers and 16 end-users, we identified use-cases, evaluation criteria and workflows. We present Lexara, a user-centered evaluation toolkit for CVA that operationalizes these insights into: (i) test-cases spanning real-world scenarios; (ii) interpretable metrics covering visualization quality (data fidelity, semantic alignment, functional correctness, design clarity) and language quality (factual grounding, analytical reasoning, conversational coherence) using rule-based and LLM-as-a-judge methods; and (iii) an interactive toolkit enabling experimental setup and multi-format and multi-level exploration of results without programming expertise. We conducted a two-week diary study with six CVA developers, drawn from our initial cohort of 22. Their feedback demonstrated Lexara's effectiveness for guiding appropriate model and prompt selection.

Tableau Research, Palo Alto, California, United States

Design feedback helps practitioners improve their artifacts while also fostering reflection and design reasoning. Large Language Models (LLMs) such as ChatGPT can support design work, but often provide generic, one-off suggestions that limit reflective engagement. We investigate how to guide LLMs to act as design mentors by applying the Cognitive Apprenticeship Model, which emphasizes demonstrating reasoning through six methods: modeling, coaching, scaffolding, articulation, reflection, and exploration. We operationalize these instructional methods through structured prompting and evaluate them in a within-subjects study with data visualization practitioners. Participants interacted with both a baseline LLM and an instructional LLM designed with cognitive apprenticeship prompts. We further conducted surveys, interviews, and conversational log analyses to evaluate experiences across conditions. Our findings show that cognitively informed prompts elicit deeper design reasoning and more reflective feedback exchanges, though the baseline is sometimes preferred depending on task types or experience levels. We distill design considerations for AI-assisted feedback systems that foster reflective practice.

Boston College, Chestnut Hill, Massachusetts, United States

Inria, Bordeaux, France

Boston College, Chestnut Hill, Massachusetts, United States

Hyperparameter optimization (HPO) is a long-running process that can span hours or even days. While recent Human-in-the-Loop HPO systems enable monitoring and steering of the process, they are typically designed for desktop environments, which limits their effectiveness in managing prolonged experiments in practice. To address these limitations, we present HyPockeTuner, an interactive mobile system that enables users to monitor, steer, and reflect on HPO experiments anytime, anywhere from smartphones. Its mobile-tailored interface supports tracking experiment history and visualizing the relationship between user interventions and performance changes. HyPockeTuner also employs a notification workflow that alerts users to important events, reducing the burden of constant monitoring while enabling timely interventions. In a pilot study, we validated that users could readily identify critical events, such as performance improvements and intervention points, through our visualization. Furthermore, two five-day deployment studies with follow-up reflection sessions demonstrated that users could integrate experiment management into their daily routines and reflect on past decisions, generating insights for future improvement.

Sungkyunkwan University, Suwon, Korea, Republic of

Yonsei University, Seoul, Korea, Republic of

Seoul National University, Seoul, Korea, Republic of

Sungkyunkwan University, Suwon, Korea, Republic of

The Transformer architecture underpins modern large language models powering state-of-the-art text generation and AI applications. However, its complexity makes it difficult for non-experts to learn. Existing resources often lack interactivity, rely on static descriptions of simplified architectures, or fail to reflect models’ behavior with real data. To address this gap, we introduce Transformer Explainer, an interactive visualization tool for non-experts to learn Transformers. The tool integrates an overview illustrating the Transformer's data flow with on-demand explanations that gradually reveal mathematical details. Smooth transitions across abstraction levels highlight the interplay between high-level structures and low-level operations. Running a live GPT-2 instance directly in the browser, Transformer Explainer empowers learners to experiment with custom input and hyperparameters without setup, observing next-token predictions in real time. A 90-participant user study showed that our tool offered significant advantages in improving user understanding and engagement. Transformer Explainer has attracted over 490,000 users.

Georgia Institute of Technology, Atlanta, Georgia, United States

Georgia Tech, Atlanta, Georgia, United States

Georgia Institute of Technology, Atlanta, Georgia, United States

IBM Research AI, Cambridge, Massachusetts, United States

Georgia Tech, Atlanta, Georgia, United States

Yonsei University, Seoul, Korea, Republic of

Georgia Tech, Atlanta, Georgia, United States