StegoType: Surface Typing from Egocentric Cameras

要旨

Text input is a critical component of any general purpose computing system, yet efficient and natural text input remains a challenge in AR and VR. Headset based hand-tracking has recently become pervasive among consumer VR devices and affords the opportunity to enable touch typing on virtual keyboards. We present an approach for decoding touch typing on uninstrumented flat surfaces using only egocentric camera-based hand-tracking as input. While egocentric hand-tracking accuracy is limited by issues like self occlusion and image fidelity, we show that a sufficiently diverse training set of hand motions paired with typed text can enable a deep learning model to extract signal from this noisy input. Furthermore, by carefully designing a closed-loop data collection process, we can train an end-to-end text decoder that accounts for natural sloppy typing on virtual keyboards. We evaluate our work with a user study (n=18) showing a mean online throughput of 42.4 WPM with an uncorrected error rate (UER) of 7% with our method compared to a physical keyboard baseline of 74.5 WPM at 0.8% UER, showing progress towards unlocking productivity and high throughput use cases in AR/VR.

著者
Mark Richardson
Meta, Seattle, Washington, United States
Fadi Botros
Meta, Redmond, Washington, United States
Yangyang Shi
Meta, Redmond, Washington, United States
Pinhao Guo
Meta, Redmond, Washington, United States
Bradford J. Snow
Meta, Redmond, Washington, United States
Linguang Zhang
Meta, Redmond, Washington, United States
Jingming Dong
Meta, Redmond, Washington, United States
Keith Vertanen
Michigan Technological University, Houghton, Michigan, United States
Shugao Ma
Meta, Redmond, Washington, United States
Robert Wang
Meta, Redmond, Washington, United States
論文URL

https://doi.org/10.1145/3654777.3676343

動画

会議: UIST 2024

ACM Symposium on User Interface Software and Technology

セッション: 1. New realities

Westin: Allegheny 1
6 件の発表
2024-10-16 01:10:00
2024-10-16 02:40:00