Hand Pose & Gestures

会議の名前
CHI 2026
FingerBar: A Mid-Air Touch Bar Interface for Earphones Using Finger-Generated Acoustics
要旨

Current touch-based interactions on earphones are limited by hygiene concerns and the small interaction surface. Recent works attempt to bypass these issues with mid-air gesture systems using active acoustic sensing. However, these signals may be audible and pose potential hearing risks. To address this, we propose FingerBar, a mid-air gesture recognition system for earphones that relies solely on microphones without active signal transmission. FingerBar leverages the distinctive friction sounds generated by finger gestures to achieve gesture recognition. We design a gesture filtering pipeline to maintain robustness against daily noise. An adversarial training strategy further enhances user-independent performance. From a set of 16 gestures, we identify the 7 most suitable for FingerBar based on user acceptability. Extensive evaluations demonstrate high accuracy and robustness. Furthermore, a user study confirms the practicality and acceptability of the system. Our findings highlight the promise of passive acoustic sensing as a user-friendly interaction modality for earphones.

著者
Yankai Zhao
Southern University of Science and Technology, Shenzhen, China
Wentao Xie
The Hong Kong University of Science and Technology, Hong Kong, China
Haorui Li
Southern University of Science and Technology, Shenzhen, China
Jiao LI
Southern University of Science and Technology, Shenzhen, China
Tao Sun
Southern University of Science and Technology, Shenzhen, China
Qian Zhang
The Hong Kong University of Science and Technology, Hong Kong, China
Jin Zhang
Southern University of Science and Technology, Shenzhen, Guangdong, China
TraceRing: Touchpad-like Pointing with a Single IMU Ring through Personalized Learning
要旨

Achieving touchpad-like pointing with a single IMU ring is highly desirable for portable and wearable interaction, yet challenging due to incomplete motion data and significant user variability. We present TraceRing, a finger-worn IMU system that enables precise two-dimensional cursor control. To address the limitations of generic end-to-end models, we propose a personalized training framework that learns user-specific representations through joint multi-task and contrastive learning, while dynamically selecting the most suitable expert model. This approach enables personalization without requiring per-user fine-tuning, and reduces velocity prediction error by 33.9% over state-of-the-art baselines. Furthermore, a real-time study shows it delivers speed and accuracy far exceeding those of AirMouse (2.26s v.s. 3.01s in average task completion time). These results demonstrate TraceRing as a portable and comfortable alternative for mobile computing and AR interaction applications.

著者
Zhe He
Tsinghua University, Beijing, Beijing, China
Weinan Shi
Tsinghua University, Beijing, China
Zixuan Wang
Tsinghua University, Beijing, China
Suya Wu
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Xiyuan Shen
University of Washington, Seattle, Washington, United States
Chengchi Zhou
Tsinghua University, Beijing, China
Chun Yu
Tsinghua University, Beijing, Beijing, China
Yuanchun Shi
Tsinghua University, Beijing, China
DeltaDorsal: Enhancing Hand Pose Estimation with Dorsal Features in Egocentric Views
要旨

The proliferation of XR devices has made egocentric hand pose estimation a vital task, yet this perspective is inherently challenged by frequent finger occlusions. To address this, we propose a novel approach that leverages the rich information in dorsal hand skin deformation, unlocked by recent advances in dense visual featurizers. We introduce a dual-stream delta encoder that learns pose by contrasting features from a dynamic hand with a baseline relaxed position. Our evaluation demonstrates that, using only cropped dorsal images, our method reduces the Mean Per Joint Angle Error (MPJAE) by 18% in self-occluded scenarios (fingers >= 50% occluded) compared to state-of-the-art techniques that depend on the whole hand's geometry and large model backbones. Consequently, our method not only enhances the reliability of downstream tasks like index finger pinch and tap estimation in occluded scenarios but also unlocks new interaction paradigms, such as detecting isometric force for a surface "click" without visible movement while minimizing model size.

著者
William Huang
Unversity of California, Los Angeles, Los Angeles, California, United States
Siyou Pei
University of California, Los Angeles, Los Angeles, California, United States
Leyi Zou
University of California, Los Angeles, Los Angeles, California, United States
Eric J. Gonzalez
Google, Seattle, Washington, United States
Ishan Chatterjee
Google, Seattle, Washington, United States
Yang Zhang
University of California, Los Angeles, Los Angeles, California, United States
WatchHand: Enabling Continuous Hand Pose Tracking On Off-the-Shelf Smartwatches
要旨

Tracking hand poses on wrist-wearables enables rich, expressive interactions, yet remains unavailable on commercial smartwatches, as prior implementations rely on external sensors or custom hardware, limiting their real-world applicability. To address this, we present WatchHand, the first continuous 3D hand pose tracking system implemented on off-the-shelf smartwatches using only their built-in speaker and microphone. WatchHand emits inaudible frequency-modulated continuous waves and captures their reflections from the hand. These acoustic signals are processed by a deep-learning model that estimates 3D hand poses for 20 finger joints. We evaluate WatchHand across diverse real-world conditions---multiple smartwatch models, wearing-hands, body postures, noise conditions, pose-variation protocols---and achieve a mean per-joint position error of 7.87 mm in cross-session tests with device remounting. Although performance drops for unseen users or gestures, the model adapts effectively with lightweight fine-tuning on small amounts of data. Overall, WatchHand lowers the barrier to smartwatch-based hand tracking by eliminating additional hardware while enabling robust, always-available interactions on millions of existing devices.

著者
Jiwan Kim
KAIST, Daejeon, Korea, Republic of
Chi-Jung Lee
Cornell University, Ithaca, New York, United States
Hohurn Jung
KAIST, Deajon, Korea, Republic of
Tianhong Catherine. Yu
Cornell University, Ithaca, New York, United States
Ruidong Zhang
Cornell University, Ithaca, New York, United States
Ian Oakley
KAIST, Daejeon, Korea, Republic of
Cheng Zhang
Cornell University, ITHACA, New York, United States
3DRing: Enabling Low-Cost 3D Hand Position Tracking by Fusing Inertial and Low-Framerate Optical Sensing
要旨

Current mobile hand tracking systems primarily rely on high-framerate (HFR) optical sensors to capture hand positions, resulting in high computational cost and limiting the applicability in end devices. We propose 3DRing, a 3D hand position tracking method that requires only low-framerate (LFR, <10 FPS) optical data and a single IMU ring. It consists of two stages: (1) a Deep Extended Kalman Filter module that predicts high-framerate hand positions from LFR optical measurements and a single IMU; (2) a Reinforcement Learning module that adaptively selects minimal keyframes for calibration, further reducing the average optical framerate. Using only 6.61 FPS optical data, 3DRing achieves an average real-time tracking error of 1.75 cm and an interaction efficiency of 86.0% in a 3D target selection task, compared to the 67 FPS hand tracking system of Meta Quest Pro, demonstrating a strong potential to reduce the reliance on optical data in mobile hand tracking tasks.

著者
Zhuojun Li
Tsinghua University, Beijing, China
Lubin Wang
Tsinghua University, Beijing, China
Chun Yu
Tsinghua University, Beijing, China
Chang Liu
Tsinghua University, BeiJing, China
Mingyuan Du
Tsinghua University, Beijing, China
Weinan Shi
Tsinghua University, Beijing, China
Yuanchun Shi
Tsinghua University, Beijing, China
Investigating Single-Handed Microgesture Scrolling Techniques
要旨

Scrolling is ubiquitous in our daily computing experience. We explore how single-handed microgestures can be used for scrolling. Based on an analysis of the basic components necessary for scrolling, we selected 3 microgestures: Tap, Hold and Drag. Considering both rate and position controls, we designed 4 microgesture-based scrolling techniques adapted to these 3 microgestures. We contrasted these 4 techniques in a laboratory experiment with 24 participants who performed 2 tasks: a reciprocal selection task, where participants scrolled the view to reach and select a target; and a counting task, where participants scrolled the view to count image occurrences. Our results suggest that the technique based on Drag microgestures with rate control is the most effective for scrolling operations, regardless of the task. This work demonstrates that microgestures, with their advantages for frequent everyday tasks, offer a promising approach to continuous and efficient scrolling control.

著者
Suliac Lavenant
Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, Lille, France
Alix Goguey
Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
Sylvain Malacria
Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, Lille, France
Laurence Nigay
Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
Thomas Pietrzak
Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRIStAL, Lille, France
動画
RealTwin: Concept Graph Representation and Grounding Framework for Reality-Preserving Digital Twin Reconstruction
要旨

Reconstructing realistic digital twins has become crucial as advances in mixed reality, metaverse, and robotics demand more accurate simulations for the physical world. Despite technical progress, building high-fidelity digital twins from a systematic and human-centered perspective remains underexplored. Drawing from the human processing model, we decompose human-centric reality into perception, motion, and cognition, and define a reality-preserving digital twin (RPDT) as a reconstruction integrating these dimensions. We present RealTwin, an attribute-graph-based representation and inference framework for RPDT. Leveraging the grounding capabilities of Multimodal Large Language Models (MLLMs), RealTwin chains AI tools to construct attribute graphs that faithfully encode real-world properties. We validate RealTwin through both technical evaluation, showing promising success in graph parsing and attribute inference, and a user study, assessing its applicability across diverse user groups. Enlightened by RealTwin, we discuss critical issues, including ecology, interaction space, and real-world adoption, for future end-to-end, fine-grained, and scalable digital twin reconstruction.

著者
Zisu Li
The Hong Kong University of Science and Technology, Hong Kong SAR, Hong Kong, China
Ruohao Li
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Jiawei Li
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Chao Liu
The University of British Columbia, Vancouver, British Columbia, Canada
Junyi Zhu
University of Michigan, Ann Arbor, Michigan, United States
Daniela Rus
Massachusetts Institute of Technology, Cambridge, Massachusetts, United States
Chen Liang
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China
Mingming Fan
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China