Large language models (LLMs) have shown exceptional performance in various language-related tasks. However, their application in keyboard decoding, which involves converting input signals (e.g. taps and gestures) into text, remains underexplored. This paper presents a fine-tuned FLAN-T5 model for decoding. It achieves 93.1% top-1 accuracy on user-drawn gestures, outperforming the widely adopted SHARK2 decoder, and 95.4% on real-word tap typing data. In particular, our decoder supports Flexible Typing, allowing users to enter a word with taps, gestures, multi-stroke gestures, and tap-gesture combinations. User study results show that Flexible Typing is beneficial and well-received by participants, where 35.9% of words were entered using word gestures, 29.0% with taps, 6.1% with multi-stroke gestures, and the remaining 29.0% using tap-gestures. Our investigation suggests that the LLM-based decoder improves decoding accuracy over existing word gesture decoders while enabling the Flexible Typing method, which enhances the overall typing experience and accommodates diverse user preferences.
https://dl.acm.org/doi/10.1145/3706598.3714314
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)