GazePointAR: A Context-Aware Multimodal Voice Assistant for Pronoun Disambiguation in Wearable Augmented Reality

要旨

Voice assistants (VAs) like Siri and Alexa are transforming human-computer interaction; however, they lack awareness of users' spatiotemporal context, resulting in limited performance and unnatural dialogue. We introduce GazePointAR, a fully-functional context-aware VA for wearable augmented reality that leverages eye gaze, pointing gestures, and conversation history to disambiguate speech queries. With GazePointAR, users can ask "what's over there?" or "how do I solve this math problem?" simply by looking and/or pointing. We evaluated GazePointAR in a three-part lab study (N=12): (1) comparing GazePointAR to two commercial systems, (2) examining GazePointAR's pronoun disambiguation across three tasks; (3) and an open-ended phase where participants could suggest and try their own context-sensitive queries. Participants appreciated the naturalness and human-like nature of pronoun-driven queries, although sometimes pronoun use was counter-intuitive. We then iterated on GazePointAR and conducted a first-person diary study examining how GazePointAR performs in-the-wild. We conclude by enumerating limitations and design considerations for future context-aware VAs.

著者
Jaewook Lee
University of Washington, Seattle, Washington, United States
Jun Wang
University of Washington, Seattle, Washington, United States
Elizabeth Brown
University of Washington, Seattle, Washington, United States
Liam Chu
University of Washington, Seattle, Washington, United States
Sebastian S.. Rodriguez
University of Illinois at Urbana-Champaign, Urbana, Illinois, United States
Jon E.. Froehlich
University of Washington, Seattle, Washington, United States
論文URL

https://doi.org/10.1145/3613904.3642230

動画

会議: CHI 2024

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2024.acm.org/)

セッション: Hand and Gaze

314
5 件の発表
2024-05-16 01:00:00
2024-05-16 02:20:00