The diminutive size of wrist wearables has prompted the design of many novel input techniques to increase expressivity. Finger identification, or assigning different functionality to different fingers, has been frequently proposed. However, while the value of the technique seems clear, its implementation remains challenging, often relying on external devices (e.g., worn magnets) or explicit instructions. Addressing these limitations, this paper explores a novel approach to natural and unencumbered finger identification on an unmodified smartwatch: sonar. To do this, we adapt an existing finger tracking smartphone sonar implementation---rather than extract finger motion, we process raw sonar fingerprints representing the complete sonar scene recorded during a touch. We capture data from 16 participants operating a smartwatch and use their sonar fingerprints to train a deep learning recognizer that identifies taps by the thumb, index, and middle fingers with an accuracy of up to 93.7%, sufficient to support meaningful application development.
Speech is inappropriate in many situations, limiting when voice control can be used. Most unvoiced speech text entry systems can not be used while on-the-go due to movement artifacts. Using a dental retainer with capacitive touch sensors, SilentSpeller tracks tongue movement, enabling users to type by spelling words without voicing. SilentSpeller achieves an average 97% character accuracy in offline isolated word testing on a 1164-word dictionary. Walking has little effect on accuracy; average offline character accuracy was roughly equivalent on 107 phrases entered while walking (97.5%) or seated (96.5%). To demonstrate extensibility, the system was tested on 100 unseen words, leading to an average 94% accuracy. Live text entry speeds for seven participants averaged 37 words per minute at 87% accuracy. Comparing silent spelling to current practice suggests that SilentSpeller may be a viable alternative for silent mobile text entry.
By sensing how a user is holding a smartphone, adaptive user interfaces are possible such as those that automatically switch the displayed content and position of graphical user interface (GUI) components following how the phone is being held. We propose ReflecTouch, a novel method for detecting how a smartphone is being held by capturing images of the smartphone screen reflected on the cornea with a built-in front camera. In these images, the areas where the user places their fingers on the screen appear as shadows, which makes it possible to estimate the grasp posture. Since most smartphones have a front camera, this method can be used regardless of the device model; in addition, no additional sensor or hardware is required. We conducted data collection experiments to verify the classification accuracy of the proposed method for six different grasp postures, and the accuracy was 85%.
Face orientation can often indicate users’ intended interaction target. In this paper, we propose FaceOri, a novel face tracking technique based on acoustic ranging using earphones. FaceOri can leverage the speaker on a commodity device to emit an ultrasonic chirp, which is picked up by the set of microphones on the user’s earphone, and then processed to calculate the distance from each microphone to the device. These measurements are used to derive the user’s face orientation and distance with respect to the device. We conduct a ground truth comparison and user study to evaluate FaceOri’s performance. The results show that the system can determine whether the user orients to the device at a 93.5% accuracy within a 1.5 meters range. Furthermore, FaceOri can continuously track the user’s head orientation with a median absolute error of 10.9 mm in the distance, 3.7◦ in yaw, and 5.8◦ in pitch. FaceOri can allow for convenient hands-free control of devices and produce more intelligent context-aware interaction.
Many interactive systems are susceptible to misinterpreting the user's input actions or gestures. Interpretation errors are common when systems gather a series of signals from the user and then attempt to interpret the user's intention based on those signals -- e.g., gesture identification from a touchscreen, camera, or body-worn electrodes -- and previous work has shown that interpretation error can cause significant problems for learning new input commands. Error-reduction strategies from telecommunications, such as repeating a command or increasing the length of the input while reducing its expressiveness, could improve these input mechanisms -- but little is known about whether longer command sequences will cause problems for users (e.g., increased effort or reduced learning). We tested performance, learning, and perceived effort in a crowd-sourced study where participants learned and used input mechanisms with different error-reduction techniques. We found that error reduction techniques are feasible, can outperform error-prone ordinary input, and do not negatively affect learning or perceived effort.