Peeking Ahead of the Field Study: Exploring VLM Personas as Support Tools for Embodied Studies in HCI

要旨

Field studies are irreplaceable but costly, time-consuming, and error-prone, which need careful preparation. Inspired by rapid-prototyping in manufacturing, we propose a fast, low-cost evaluation method using Vision-Language Model (VLM) personas to simulate outcomes comparable to field results. While LLMs show human-like reasoning and language capabilities, autonomous vehicle (AV)-pedestrian interaction requires spatial awareness, emotional empathy, and behavioral generation. This raises our research question: To what extent can VLM personas mimic human responses in field studies? We conducted parallel studies: 1) one real-world study with 20 participants, and 2) one video-study using 20 VLM personas, both on a street-crossing task. We compared their responses and interviewed five HCI researchers on potential applications. Results show that VLM personas mimic human response patterns (e.g., average crossing times of 5.25 s vs. 5.07 s) lack the behavioral variability and depth. They show promise for formative studies, field study preparation, and human data augmentation.

受賞
Honorable Mention
著者
Xinyue Gui
The University of Tokyo, Tokyo, Japan
Ding Xia
The University of Tokyo, Tokyo, Japan
Mark Colley
UCL Interaction Centre, London, United Kingdom
Yuan Li
Keio University, Fujisawa-shi, Japan
Vishal Chauhan
The University of Tokyo, Bunkyo, Tokyo, Japan
Anubhav Anubhav
The University of Tokyo, Tokyo, Japan
Zhongyi Zhou
Google, Tokyo, Japan
Ehsan Javanmardi
The University of Tokyo, Tokyo, Japan
Stela Hanbyeol. Seo
Kyoto University, Kyoto, Kyoto, Japan
Chia-Ming Chang
National Taiwan University of Arts, Taipei, Taiwan
Manabu Tsukada
The University of Tokyo, Tokyo, Japan
Takeo Igarashi
The University of Tokyo, Tokyo, Japan
動画

会議: CHI 2026

ACM CHI Conference on Human Factors in Computing Systems

セッション: Human-Robot Interaction & Embodied Sensing

P1 - Room 134
7 件の発表
2026-04-15 18:00:00
2026-04-15 19:30:00