With the rise of mixed reality (MR) and augmented reality (AR) applications, efficient text input in AR/MR environments remains challenging. We propose \textit{FineType}, a text entry system using tapping gestures with finger combinations and postures on any flat surface. Using a wristband with an IMU and an infrared camera, we detect tapping events and employ a multi-task convolutional neural network to predict these gestures, enabling nearly full keyboard mapping (including letters, symbols, numbers, etc.) with one hand. We collected gestures from participants (N=28) with 10 finger combinations and 3 finger postures for training. Cross-user validation showed accuracies of 98.26\% for combinations, 95.53\% for postures, and 94.19\% for all categories. For 8 newly defined finger combinations and their postures, classification accuracies were 91.27\% and 93.86\%. Using user-adaptive few-shot learning, we improved the finger combination accuracy to 97.05\%. The results demonstrate our potential to map tapping gestures composed of all finger combinations and three postures. Our user study (N=10) demonstrated an average typing speed of 35.1 WPM with a character error rate of 5.1\% after two hours of practice.
https://dl.acm.org/doi/10.1145/3706598.3714278
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)