Gaze as Input

会議の名前
CHI 2026
GazeZoom: Exploration of Gaze-Assisted Multimodal Techniques for Panning and Zooming
要旨

Zooming and panning are fundamental input actions for exploring complex 2D and 3D scenes and data such as images, maps, and designs. Multi-touch zoom/pan interactions have been proven effective on mobile devices, and have been directly ported to HMDs, where they are typically accomplished by analogous but relatively large-scale movements of both hands. We argue that such motions are inefficient and induce fatigue and explore how the eye-tracking features of HMDs can be leveraged to achieve improvements. We evaluated three interaction techniques that combine gaze with two-handed, one-handed, and head-based input in a study (N=24) that contrasts them against a baseline two-handed technique. The results indicate that gaze-assisted two- and one-handed techniques outperform the baseline (17%-36% faster), while our head-based technique achieves similar performance to the Baseline but leaves the hands free for other tasks. We further developed a VR application demonstrating these techniques and validating their practical applicability.

著者
Yilong Lin
Southern University of Science and Technology, Shenzhen, China
Mingyu Han
KAIST, Daejeon, Korea, Republic of
Weitao Jiang
Southern University of Science and Technology, Shenzhen, China
Seungwoo Je
Southern University of Science and Technology, Shenzhen, China
Ian Oakley
KAIST, Daejeon, Korea, Republic of
The People's Gaze: Co-Designing and Refining Gaze Gestures with Users and Experts
要旨

As eye-tracking becomes increasingly common in modern mobile devices, the potential for hands-free, gaze-based interaction grows, but current gesture sets are largely expert-designed and often misaligned with how users naturally move their eyes. To address this gap, we introduce a two-phase methodology for developing intuitive gaze gestures. First, four co-design workshops with 20 non-expert participants generated 102 initial concepts. Next, four gaze interaction experts reviewed and refined these into a set of 32 gestures. We found that non-experts, after a brief introduction, intuitively anchor gestures in familiar metaphors and develop a compositional grammar; i.e., activation (dwell) + action (gaze gesture or blink), to ensure intentionality and mitigate the classic Midas Touch problem. Experts prioritized gestures that are ergonomically sound, aligned with natural saccades, and reliably distinguishable. The resulting user-grounded, expert-validated gesture set, along with actionable design principles, provides a foundation for developing intuitive, hands-free interfaces for gaze-enabled devices.

受賞
Honorable Mention
著者
Yaxiong Lei
University of St Andrews, St Andrews, United Kingdom
Xinya Gong
University of St Andrews, Fife, United Kingdom
Shijing He
King's College London, London, United Kingdom
Yafei Wang
Dalian Maritime University, Dalian, Liaoning, China
Mohamed Khamis
University of Glasgow, Glasgow, United Kingdom
Juan Ye
University of St Andrews, St Andrews, United Kingdom
HiFiGaze: Improving Eye Tracking Accuracy Using Screen Content Knowledge
要旨

We present a new and accurate approach for gaze estimation on consumer computing devices. We take advantage of continued strides in the quality of user-facing cameras found in e.g., smartphones, laptops, and desktops — 4K or greater in high-end devices — such that it is now possible to capture the 2D reflection of a device's screen in the user's eyes. This alone is insufficient for accurate gaze tracking due to the near-infinite variety of screen content. Crucially, however, the device knows what is being displayed on its own screen — in this work, we show this information allows for robust segmentation of the reflection, the location and size of which encodes the user's screen-relative gaze target. We explore several strategies to leverage this useful signal, quantifying performance in a user study. Our best performing model reduces mean tracking error by ~18% compared to a baseline appearance-based model. A supplemental study reveals an additional 10-20% improvement if the gaze-tracking camera is located at the bottom of the device.

受賞
Honorable Mention
著者
Taejun Kim
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Vimal Mollyn
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Riku Arakawa
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Chris Harrison
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Gaze and Speech in Multimodal Human-Computer Interaction: A Scoping Review
要旨

Multimodal interaction has long promised to make interfaces more intuitive and effective by combining complementary inputs. Among these, gaze and speech form a compelling pairing: gaze provides rapid spatial grounding, while speech conveys rich semantic information. Together, they offer rich cues for understanding user behaviour and intent. Yet despite decades of exploration, the research remains fragmented, making this synthesis timely as these inputs mature and are integrated into consumer-ready devices. This scoping review examined 103 studies published between 1991 and 2025, organised into \emph{explicit}, where users intentionally provide gaze and speech, and \emph{implicit}, where systems leverage users' natural behaviours to support interaction. Across both, we identified recurring ways for combining gaze and speech to resolve ambiguity, ground references, and support adaptivity. We contribute a synthesis of research on their combined use while highlighting challenges of temporal alignment, fusion and privacy, offering guidance for future research toward richer multimodal human-computer interaction.

著者
Anam Ahmad Khan
KAIST, Daejeon, Korea, Republic of
Florian Weidner
Glasgow University, Glasgow, United Kingdom
Jungwoo Rhee
KAIST, Daejeon, Korea, Republic of
Yasmeen Abdrabou
Technical University of Munich , München, Germany
Andrea Bianchi
KAIST, Daejeon, Korea, Republic of
Eduardo Velloso
The University of Sydney, Sydney, New South Wales, Australia
Hans Gellersen
Lancaster University, Lancaster, United Kingdom
Joshua Newn
RMIT University, Melbourne, VIC, Australia
Eyes on Many: Evaluating Gaze, Hand, and Voice for Multi-Object Selection in Extended Reality
要旨

Interacting with multiple objects simultaneously makes us fast. A pre-step to this interaction is to select the objects, i.e., multi-object selection, which is enabled through two steps: (1) toggling multi-selection mode --- mode-switching --- and then (2) selecting all the intended objects --- subselection. In extended reality (XR), each step can be performed with the eyes, hands, and voice. To examine how design choices affect user performance, we evaluated four mode-switching (\Semipinch, \Fullpinch, \Doublepinch, and \Voice) and three subselection techniques (Gaze+Dwell, Gaze+Pinch, and Gaze+Voice) in a user study. Results revealed that while \Doublepinch paired with Gaze+Pinch yielded the highest overall performance, \Semipinch achieved the lowest performance. Although \Voice-based mode-switching showed benefits, Gaze+Voice subselection was less favored, as the required repetitive vocal commands were perceived as tedious. Overall, these findings provide empirical insights and inform design recommendations for multi-selection techniques in XR.

著者
Mohammad Raihanul Bashar
Concordia University, Montreal, Quebec, Canada
Aunnoy K Mutasim
Simon Fraser University, Vancouver, British Columbia, Canada
Ken Pfeuffer
Aarhus University, Aarhus, Denmark
Anil Ufuk Batmaz
Concordia University, Montreal, Quebec, Canada
Understanding Gaze-Based Identification in VR Through Preattentive Processing and Binocular Rivalry
要旨

Stimulus-evoked gaze dynamics offer a secure and hands-free signal in virtual reality (VR), yet the underlying design space of effective visual stimuli remains poorly understood. This work examines how preattentive processing and binocular rivalry can inform stimulus design for gaze-based identification in VR. We conducted a two-part study: (1) a feasibility assessment of closed-set identification performance with 26 participants and 44,928 gaze samples collected by using a commercial headset (Meta Quest Pro), and (2) a usability study with 16 participants comparing the same interaction in a login context to PIN and out-of-band methods as a potential authentication technique. Our findings confirm the feasibility of personal identification, highlight usability advantages, and reveal participants’ desire for greater transparency to understand individual variations in login results. Together, these results offer conceptual insights into the perceptual mechanisms shaping stimulus-evoked gaze behavior, and outline design implications for future VR authentication workflows.

著者
Junryeol Jeon
Gwangju Institute of Science and Technology, Gwangju, Korea, Republic of
Yeo-Gyeong Noh
Gwangju Institute of Science and Technology, Gwangju, Korea, Republic of
JinYoung Yoo
Gwangju Institute of Science and Technology, Gwangju , Korea, Republic of
Jin-Hyuk Hong
Gwangju Institute of Science and Technology, Gwangju, Korea, Republic of