Gesturing Toward Abstraction: Multimodal Convention Formation in Collaborative Physical Tasks

要旨

A quintessential feature of human intelligence is the ability to create ad hoc conventions over time to achieve shared goals efficiently. We investigate how communication strategies evolve through repeated collaboration as people coordinate on shared procedural abstractions. To this end, we conducted an online unimodal study (n = 98) using natural language to probe abstraction hierarchies. In a follow-up lab study (n = 40), we examined how multimodal communication (speech and gestures) changed during physical collaboration. Pairs used augmented reality to isolate their partner’s hand and voice; one participant viewed a 3D virtual tower and sent instructions to the other, who built the physical tower. Participants became faster and more accurate by establishing linguistic and gestural abstractions and using cross-modal redundancy to emphasize key changes from previous interactions. Based on these findings, we extend probabilistic models of convention formation to multimodal settings, capturing shifts in modality preferences. Our findings and model provide building blocks for designing convention-aware intelligent agents situated in the physical world.

著者
Kiyosu Maeda
Princeton University, Princeton, New Jersey, United States
William P. McCarthy
University of California, San Diego, La Jolla, California, United States
Ching-Yi Tsai
Princeton University, Princeton, New Jersey, United States
Jeffrey Mu
Brown University, Providence, Rhode Island, United States
Haoliang Wang
MIT, Cambridge, Massachusetts, United States
Robert Hawkins
Stanford University, Stanford, California, United States
Judith E.. Fan
Stanford University, Stanford, California, United States
Parastoo Abtahi
Princeton University, Princeton, New Jersey, United States

会議: CHI 2026

ACM CHI Conference on Human Factors in Computing Systems

セッション: Human-AI Interaction & GenAI

P1 - Room 122
7 件の発表
2026-04-15 20:15:00
2026-04-15 21:45:00