93. Gaze as Input

前のセッションの直後

6

3分30秒

富田智晶

GazeZoom: Exploration of Gaze-Assisted Multimodal Techniques for Panning and Zooming

The People's Gaze: Co-Designing and Refining Gaze Gestures with Users and Experts

HiFiGaze: Improving Eye Tracking Accuracy Using Screen Content Knowledge

Gaze and Speech in Multimodal Human-Computer Interaction: A Scoping Review

Eyes on Many: Evaluating Gaze, Hand, and Voice for Multi-Object Selection in Extended Reality

Understanding Gaze-Based Identification in VR Through Preattentive Processing and Binocular Rivalry

91. Embodied Interaction and Wearables

94. Haptics, Wearables & Embodied Interaction

Zooming and panning are fundamental input actions for exploring complex 2D and 3D scenes and data such as images, maps, and designs. Multi-touch zoom/pan interactions have been proven effective on mobile devices, and have been directly ported to HMDs, where they are typically accomplished by analogous but relatively large-scale movements of both hands. We argue that such motions are inefficient and induce fatigue and explore how the eye-tracking features of HMDs can be leveraged to achieve improvements. We evaluated three interaction techniques that combine gaze with two-handed, one-handed, and head-based input in a study (N=24) that contrasts them against a baseline two-handed technique. The results indicate that gaze-assisted two- and one-handed techniques outperform the baseline (17%-36% faster), while our head-based technique achieves similar performance to the Baseline but leaves the hands free for other tasks. We further developed a VR application demonstrating these techniques and validating their practical applicability.

読み込み中…

As eye-tracking becomes increasingly common in modern mobile devices, the potential for hands-free, gaze-based interaction grows, but current gesture sets are largely expert-designed and often misaligned with how users naturally move their eyes. To address this gap, we introduce a two-phase methodology for developing intuitive gaze gestures. First, four co-design workshops with 20 non-expert participants generated 102 initial concepts. Next, four gaze interaction experts reviewed and refined these into a set of 32 gestures. We found that non-experts, after a brief introduction, intuitively anchor gestures in familiar metaphors and develop a compositional grammar; i.e., activation (dwell) + action (gaze gesture or blink), to ensure intentionality and mitigate the classic Midas Touch problem. Experts prioritized gestures that are ergonomically sound, aligned with natural saccades, and reliably distinguishable. The resulting user-grounded, expert-validated gesture set, along with actionable design principles, provides a foundation for developing intuitive, hands-free interfaces for gaze-enabled devices.

読み込み中…

We present a new and accurate approach for gaze estimation on consumer computing devices. We take advantage of continued strides in the quality of user-facing cameras found in e.g., smartphones, laptops, and desktops — 4K or greater in high-end devices — such that it is now possible to capture the 2D reflection of a device's screen in the user's eyes. This alone is insufficient for accurate gaze tracking due to the near-infinite variety of screen content. Crucially, however, the device knows what is being displayed on its own screen — in this work, we show this information allows for robust segmentation of the reflection, the location and size of which encodes the user's screen-relative gaze target. We explore several strategies to leverage this useful signal, quantifying performance in a user study. Our best performing model reduces mean tracking error by ~18% compared to a baseline appearance-based model. A supplemental study reveals an additional 10-20% improvement if the gaze-tracking camera is located at the bottom of the device.

読み込み中…

Multimodal interaction has long promised to make interfaces more intuitive and effective by combining complementary inputs. Among these, gaze and speech form a compelling pairing: gaze provides rapid spatial grounding, while speech conveys rich semantic information. Together, they offer rich cues for understanding user behaviour and intent. Yet despite decades of exploration, the research remains fragmented, making this synthesis timely as these inputs mature and are integrated into consumer-ready devices. This scoping review examined 103 studies published between 1991 and 2025, organised into \emph{explicit}, where users intentionally provide gaze and speech, and \emph{implicit}, where systems leverage users' natural behaviours to support interaction. Across both, we identified recurring ways for combining gaze and speech to resolve ambiguity, ground references, and support adaptivity. We contribute a synthesis of research on their combined use while highlighting challenges of temporal alignment, fusion and privacy, offering guidance for future research toward richer multimodal human-computer interaction.

読み込み中…

Interacting with multiple objects simultaneously makes us fast. A pre-step to this interaction is to select the objects, i.e., multi-object selection, which is enabled through two steps: (1) toggling multi-selection mode --- mode-switching --- and then (2) selecting all the intended objects --- subselection. In extended reality (XR), each step can be performed with the eyes, hands, and voice. To examine how design choices affect user performance, we evaluated four mode-switching (\Semipinch, \Fullpinch, \Doublepinch, and \Voice) and three subselection techniques (Gaze+Dwell, Gaze+Pinch, and Gaze+Voice) in a user study. Results revealed that while \Doublepinch paired with Gaze+Pinch yielded the highest overall performance, \Semipinch achieved the lowest performance. Although \Voice-based mode-switching showed benefits, Gaze+Voice subselection was less favored, as the required repetitive vocal commands were perceived as tedious. Overall, these findings provide empirical insights and inform design recommendations for multi-selection techniques in XR.

読み込み中…

Stimulus-evoked gaze dynamics offer a secure and hands-free signal in virtual reality (VR), yet the underlying design space of effective visual stimuli remains poorly understood. This work examines how preattentive processing and binocular rivalry can inform stimulus design for gaze-based identification in VR. We conducted a two-part study: (1) a feasibility assessment of closed-set identification performance with 26 participants and 44,928 gaze samples collected by using a commercial headset (Meta Quest Pro), and (2) a usability study with 16 participants comparing the same interaction in a login context to PIN and out-of-band methods as a potential authentication technique. Our findings confirm the feasibility of personal identification, highlight usability advantages, and reveal participants’ desire for greater transparency to understand individual variations in login results. Together, these results offer conceptual insights into the perceptual mechanisms shaping stimulus-evoked gaze behavior, and outline design implications for future VR authentication workflows.

読み込み中…

発表担当

目次

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ