Large language models (LLMs) have shown exceptional performance in various language-related tasks. However, their application in keyboard decoding, which involves converting input signals (e.g. taps and gestures) into text, remains underexplored. This paper presents a fine-tuned FLAN-T5 model for decoding. It achieves 93.1% top-1 accuracy on user-drawn gestures, outperforming the widely adopted SHARK2 decoder, and 95.4% on real-word tap typing data. In particular, our decoder supports Flexible Typing, allowing users to enter a word with taps, gestures, multi-stroke gestures, and tap-gesture combinations. User study results show that Flexible Typing is beneficial and well-received by participants, where 35.9% of words were entered using word gestures, 29.0% with taps, 6.1% with multi-stroke gestures, and the remaining 29.0% using tap-gestures. Our investigation suggests that the LLM-based decoder improves decoding accuracy over existing word gesture decoders while enabling the Flexible Typing method, which enhances the overall typing experience and accommodates diverse user preferences.
https://dl.acm.org/doi/10.1145/3706598.3714314
The limited accuracy of eye-tracking on smartphones restricts its use. Existing RGB-camera-based eye-tracking relies on extensive datasets, which could be enhanced by continuous fine-tuning using calibration data implicitly collected from the interaction. In this context, we propose COMETIC (Cursor Operation Mediated Eye-Tracking Implicit Calibration), which introduces a cursor-based interaction and utilizes the inherent correlation between cursor and eye movement. By filtering valid cursor coordinates as proxies for the ground truth of gaze and fine-tuning the eye-tracking model with corresponding images, COMETIC enhances accuracy during the interaction. Both filtering and fine-tuning use pre-trained models and could be facilitated using personalized, dynamically updated data. Results show COMETIC achieves an average eye-tracking er- ror of 278.3 px (1.60 cm, 2.29◦), representing a 27.2% improvement compared to that without fine-tuning. We found that filtering cursor points whose actual distance to gaze is 150.0 px (0.86 cm) yields the best eye-tracking results.
https://dl.acm.org/doi/10.1145/3706598.3713936
Touchless displays often use mid-air gestures to control on-screen cursors for pointer interactions. Area cursors can simplify touchless cursor input by implicitly targeting nearby widgets without the cursor entering the target. However, for displays with dense target layouts, the cursor still has to arrive close to the widget, meaning the benefits of area cursors for time-to-target and effort are diminished. Through two experiments, we demonstrate for the first time that fine-tuning the mapping between hand and cursor movements (control-display gain -- CDG) can address the deficiencies of area cursors and improve the performance of touchless interaction. Across several display sizes and target densities (representative of myriad public displays used in retail, transport, museums, etc), our findings show that the forgiving nature of an area cursor compensates for the imprecision of a high CDG, helping users interact more effectively with smaller and more controlled hand/arm movements.
https://dl.acm.org/doi/10.1145/3706598.3714021
Tracking continuous 2D sequential handwriting trajectories accurately using a single IMU ring is extremely challenging due to the significant displacement between the IMU's wearing position and the location of the tracked fingertip. We propose WritingRing, a system that uses a single IMU ring worn at the base of the finger to support natural handwriting input and provide real-time 2D trajectories. To achieve this, we first built a handwriting dataset using a touchpad and an IMU ring (N=20). Next, we improved the LSTM model by incorporating streaming input and a TCN network, significantly enhancing accuracy and computational efficiency, and achieving an average trajectory accuracy of 1.63mm. Real-time usability studies demonstrated that the system achieved 88.7% letter recognition accuracy and 68.2% word recognition accuracy, which reached 84.36% when restricting the output to words within a vocabulary of size 3000. WritingRing can also be embedded into existing ring systems, providing a natural and real-time solution for various applications.
https://dl.acm.org/doi/10.1145/3706598.3714066
Back-of-Device (BoD) interfaces have emerged as a promising solution to free up screen real estate in smartphones by offloading interactions from the display to the back, thereby reducing reliance on on-screen interfaces. However, existing BoD solutions face limitations, such as requiring specialized hardware, consuming excessive power, or offering limited input vocabularies. We introduce MagPie, a novel BoD interface that leverages the magnetic phenomenon induced by MagSafe, part of the wireless charging standard. Users can seamlessly attach MagPie to MagSafe-enabled smartphones and interact using tangible, modular interfaces that generate unique magnetic signals upon activation. MagPie then detects these signals and recognizes the input through magnetic sensing. Our experiments with real-world users demonstrate that i) MagPie achieves high performance in accuracy, usability, deployability, responsiveness, and robustness across diverse environments, and ii) its tangible, intuitive, and customizable design opens up possibilities for a whole new class of smartphone interaction scenarios.
https://dl.acm.org/doi/10.1145/3706598.3713956
We introduce PropType, an interactive interface that transforms everyday objects into typing surfaces within an Augmented Reality (AR) environment. Users can interact with nearby props, such as cups, water bottles, boxes, and various other objects, utilizing them as on-the-go keyboards. To develop PropType, we conducted three studies. The first study involved observing users to understand how they naturally engage with prop surfaces for typing. The second study assessed the reachability and efficiency of touch input across four props with different sizes and shapes. Based on these insights, we designed customized keyboard layouts for each prop. In the third study, we evaluated typing performance using PropType, achieving an average typing speed of up to 26.1 words per minute (WPM) with 2.2% corrected error rate (CER) and 1.1% uncorrected error rate (UER). Finally, we present a PropType editing tool that allows users to customize keyboard layouts and visual effects for prop-based typing.
https://dl.acm.org/doi/10.1145/3706598.3714056
Interacting with Large Language Models (LLMs) for text editing on mobile devices currently requires users to break out of their writing environment and switch to a conversational AI interface. In this paper, we propose to control the LLM via touch gestures performed directly on the text. We first chart a design space that covers fundamental touch input and text transformations. In this space, we then concretely explore two control mappings: spread-to-generate and pinch-to-shorten, with visual feedback loops. We evaluate this concept in a user study (N=14) that compares three feedback designs: no visualisation, text length indicator, and length + word indicator. The results demonstrate that touch-based control of LLMs is both feasible and user-friendly, with the length + word indicator proving most effective for managing text generation. This work lays the foundation for further research into gesture-based interaction with LLMs on touch devices.
https://dl.acm.org/doi/10.1145/3706598.3713554