Freehand interactions with augmented and virtual reality are grow- ing in popularity, but they lack reliability and robustness. Implicit behavior from users, such as hand or gaze movements, might pro- vide additional signals to improve the reliability of input. In this paper, the primary goal is to improve the detection of a selection gesture in VR during point-and-click interaction. Thus, we propose and investigate the use of information contained within the hand motion dynamics that precede a selection gesture. We built two models that classified if a user is likely to perform a selection gesture at the current moment in time. We collected data during a pointing-and-selection task from 15 participants and trained two models with different architectures, i.e., a logistic regression classifier was trained using predefined hand motion features and a temporal convolutional network (TCN) classifier was trained using raw hand motion data. Leave-one-subject-out cross-validation PR- AUCs of 0.36 and 0.90 were obtained for each model respectively, demonstrating that the models performed well above chance (=0.13). The TCN model was found to improve the precision of a noisy selection gesture by 11.2% without sacrificing recall performance. An initial analysis of the generalizability of the models demonstrated above-chance performance, suggesting that this approach could be scaled to other interaction tasks in the future.
https://doi.org/10.1145/3526113.3545701
The ACM Symposium on User Interface Software and Technology