Digital Dexterity: Touching and Typing Techniques

https://doi.org/10.1145/3586183.3606749

Despite advancements in egocentric hand tracking using head-mounted cameras, contact detection with real-world objects remains challenging, particularly for the quick motions often performed during interaction in Mixed Reality. In this paper, we introduce a novel method for detecting touch on discovered physical surfaces purely from an egocentric perspective using optical sensing. We leverage structured laser light to detect real-world surfaces from the disparity of reflections in real-time and, at the same time, extract a time series of remote vibrometry sensations from laser speckle motions. The pattern caused by structured laser light reflections enables us to simultaneously sample the mechanical vibrations that propagate through the user's hand and the surface upon touch. We integrated Structured Light Speckle into TapLight, a prototype system that is a simple add-on to Mixed Reality headsets. In our evaluation with a Quest 2, TapLight---while moving---reliably detected horizontal and vertical surfaces across a range of surface materials. TapLight also reliably detected rapid touch contact and robustly discarded other hand motions to prevent triggering spurious input events. Despite the remote sensing principle of Structured Light Speckle, our method achieved a latency for event detection in realistic settings that matches body-worn inertial sensing without needing such additional instrumentation. We conclude with a series of VR demonstrations for situated interaction that leverage the quick touch interaction TapLight supports.

ETH Zürich, Zurich, Switzerland

https://doi.org/10.1145/3586183.3606785

We present ShadowTouch, a novel sensing method to recognize the subtle hand-to-surface touch state for independent fingers based on optical auxiliary. ShadowTouch mounts a forward-facing light source on the user's wrist to construct shadows on the surface in front of the fingers when the corresponding fingers are close to the surface. With such an optical design, the subtle vertical movements of near-surface fingers are magnified and turned to shadow features cast on the surface, which are recognizable for computer vision algorithms. To efficiently recognize the touch state of each finger, we devised a two-stage CNN-based algorithm that first extracted all the fingertip regions from each frame and then classified the touch state of each region from the cropped consecutive frames. Evaluations showed our touch state detection algorithm achieved a recognition accuracy of 99.1% and an F-1 score of 96.8% in the leave-one-out cross-user evaluation setting. We further outlined the hand-to-surface interaction space enabled by ShadowTouch's sensing capability from the aspects of touch-based interaction, stroke-based interaction, and out-of-surface information and developed four application prototypes to showcase ShadowTouch's interaction potential. The usability evaluation study showed the advantages of ShadowTouch over threshold-based techniques in aspects of lower mental demand, lower effort, lower frustration, more willing to use, easier to use, better integrity, and higher confidence.

Tsinghua University, Beijing, Beijing, China

Tsinghua University, Beijing, China

The Hong Kong University of Science and Technology, Hong Kong SAR, Hong Kong, China

Tsinghua University, Beijing, China

The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China

Tsinghua University, Beijing, China

https://doi.org/10.1145/3586183.3606809

In this study, we explore a new way to complementarily utilize the immersive visual output of VR and the physical haptic input of a smartphone. In particular, we focus on interacting with distant virtual objects using a smartphone in a through-plane manner and present a novel selection technique that overcomes the binocular parallax that occurs in such an arrangement. In our proposed technique, when a user in the stereoscopic viewing mode needs to perform a distant selection, the user brings the fingertip near the screen of the mobile device, triggering a smoothly animated transition to the monoscopic touching mode. Using a novel proof-of-concept implementation that utilizes a transparent acrylic panel, we conducted a user study and found that the proposed technique is significantly quicker, more precise, more direct, and more intuitive compared to the ray casting baseline. Subsequently, we created VR applications that explore the rich and interesting use cases of the proposed technique.

KAIST, Daejeon, Korea, Republic of

https://doi.org/10.1145/3586183.3606760

Models that can generate touch typing tasks are important to the development of touch typing keyboards. We propose TouchType- GAN, a Conditional Generative Adversarial Network that can sim- ulate locations and time stamps of touch points in touch typing. TouchType-GAN takes arbitrary text as input to generate realistic touch typing both spatially (i.e., (𝑥, 𝑦) coordinates of touch points) and temporally (i.e., timestamps of touch points). TouchType-GAN in- troduces a variational generator that estimates Gaussian Distribu- tions for every target letter to prevent mode collapse. Our experi- ments on a dataset with 3k typed sentences show that TouchType- GAN outperforms existing touch typing models, including the Ro- tational Dual Gaussian model for simulating the distribution of touch points, and the Finger-Fitts Euclidean Model for sim- ulating typing time. Overall, our research demonstrates that the proposed GAN structure can learn the distribution of user typed touch points, and the resulting TouchType-GAN can also estimate typing movements. TouchType-GAN can serve as a valuable tool for designing and evaluating touch typing input systems.

Stony Brook University, Stony Brook, New York, United States

Google, Mountain View, California, United States

Stony Brook University, Stony Brook, New York, United States

Improving keystroke savings is a long-term goal of text input research. We present a study into the design space of an abbreviated style of text input called C-PAK (Correcting and completing variable-length Prefix-based Abbreviated Keystrokes) for text entry on mobile devices. Given a variable length and potentially inaccurate input string (e.g., 'li g t m'), C-PAK aims to expand it into a complete phrase (e.g., 'looks good to me'). We develop a C-PAK prototype keyboard, PhraseWriter, based on a current state-of-the-art mobile keyboard consisting of 1.3 million n-grams and 164,000 words. Using computational simulations on a large dataset of realistic input text, we found that, in comparison to conventional single-word suggestions, PhraseWriter improves the maximum keystroke savings rate by 6.7% (from 46.3% to 49.4,), reduces the word error rate by 14.7%, and is particularly advantageous for common phrases. We conducted a lab study of novice user behavior and performance which found that users could quickly utilize the C-PAK style abbreviations implemented in PhraseWriter, achieving a higher keystroke savings rate than forward suggestions (25% vs. 16%). Furthermore, they intuitively and successfully abbreviated more with common phrases. However, users had a lower overall text entry rate due to their limited experience with the system (28.5 words per minute vs. 37.7). We outline future technical directions to improve C-PAK over the PhraseWriter baseline, and further opportunities to study the perceptual, cognitive, and physical action trade-offs that underlie the learning curve of C-PAK systems.