We study phrase-gesture typing, a gesture typing method that allows users to type short phrases by swiping through all the letters of the words in a phrase using a single, continuous gesture. Unlike word-gesture typing, where text needs to be entered word by word, phrase-gesture typing enters text phrase by phrase. To demonstrate the usability of phrase-gesture typing, we implemented a prototype called PhraseSwipe. Our system is composed of a frontend interface designed specifically for typing through phrases and a backend phrase-level gesture decoder developed based on a transformer-based neural language model. Our decoder was trained using five million phrases of varying lengths of up to five words, chosen randomly from the Yelp Review Dataset. Through a user study with 12 participants, we demonstrate that participants could type using PhraseSwipe at an average speed of 34.5 WPM with a Word Error Rate of 1.1%.
Real-time tracking of a user’s hands, arms and environment is valuable in a wide variety of HCI applications, from context awareness to virtual reality. Rather than rely on fixed and external tracking infrastructure, the most flexible and consumer-friendly approaches are mobile, self-contained, and compatible with popular device form factors (e.g., smartwatches). In this vein, we contribute DiscoBand, a thin sensing strap not exceeding 1 cm in thickness. Sensors operating so close to the skin inherently face issues with occlusion. To overcome this, our strap uses eight distributed depth sensors imaging the hand from different viewpoints, creating a sparse 3D point cloud. An additional eight depth sensors image outwards from the band to track the user’s body and surroundings. In addition to evaluating arm and hand pose tracking, we also describe a series of supplemental applications powered by our band's data, including held object recognition and environment mapping.
We present DeltaPen, a pen device that operates on passive surfaces without the need for external tracking systems or active sensing surfaces. DeltaPen integrates two adjacent lens-less optical flow sensors at its tip, from which it reconstructs accurate directional motion as well as yaw rotation. DeltaPen also supports tilt interaction using a built-in inertial sensor. A pressure sensor and high-fidelity haptic actuator complements our pen device while retaining a compact form factor that supports mobile use on uninstrumented surfaces. We present a processing pipeline that reliably extracts fine-grained pen translations and rotations from the two optical flow sensors. To asses the accuracy of our translation and angle estimation pipeline, we conducted a technical evaluation in which we compared our approach with ground-truth measurements of participants' pen movements during typical pen interactions. We conclude with several example applications that leverage our device's capabilities. Taken together, we demonstrate novel input dimensions with DeltaPen that have so far only existed in systems that require active sensing surfaces or external tracking.
EtherPose is a continuous hand pose tracking system employing two wrist-worn antennas, from which we measure the real-time dielectric loading resulting from different hand geometries (i.e., poses). Unlike worn camera-based methods, our RF approach is more robust to occlusion from clothing and avoids capturing potentially sensitive imagery. Through a series of simulations and empirical studies, we designed a proof-of-concept, worn implementation built around compact vector network analyzers. Sensor data is then interpreted by a machine learning backend, which outputs a fully-posed 3D hand. In a user study, we show how our system can track hand pose with a mean Euclidean joint error of 11.6 mm, even when covered in fabric. We also studied 2DOF wrist angle and micro-gesture tracking. In the future, our approach could be miniaturized and extended to include more and different types of antennas, operating at different self resonances.
We engineered DigituSync, a passive-exoskeleton that physically links two hands together, enabling two users to adaptively transmit finger movements in real-time. It uses multiple four-bar linkages to transfer both motion and force, while still preserving congruent haptic feedback. Moreover, we implemented a variable-length linkage that allows adjusting the force transmission ratio between the two users and regulates the amount of intervention, which enables users to customize their learning experience. DigituSync's benefits emerge from its passive design: unlike existing haptic devices (motor-based exoskeletons or electrical muscle stimulation), DigituSync has virtually no latency and does not require batteries/electronics to transmit or adjust movements, making it useful and safe to deploy in many settings, such as between students and teachers in a classroom. We validated DigituSync by means of technical evaluations and a user study, demonstrating that it instantly transfers finger motions and forces with the ability of adaptive force transmission, which allowed participants to feel more control over their own movements and to feel the teacher’s intervention was more responsive. We also conducted two exploratory sessions with a music teacher and deaf-blind users, which allowed us to gather experiential insights from the teacher’s side and explore DigituSync in applications.
Acoustic levitation has emerged as a promising approach for mid-air displays, by using multiple levitated particles as 3D voxels, cloth and thread props, or high-speed tracer particles, under the promise of creating 3D displays that users can see, hear and feel with their bare eyes, ears and hands. However, interaction with this mid-air content always occurred at a distance, since external objects in the display volume (e.g. user's hands) can disturb the acoustic fields and make the particles fall. This paper proposes TipTrap, a co-located direct manipulation technique for acoustically levitated particles. TipTrap leverages the reflection of ultrasound on the users' skin and employs a closed-loop system to create functional acoustic traps 2.1 mm below the fingertips, and addresses its 3 basic stages: selection, manipulation and deselection. We use Finite-Differences Time Domain (FDTD) simulations to explain the principles enabling TipTrap, and explore how finger reflections and user strategies influence the quality of the traps (e.g. approaching direction, orientation and tracking errors), and use these results to design our technique. We then implement the technique, characterizing its performance with a robotic hand setup and finish with an exploration of the ability of TipTrap to manipulate different types of levitated content.