GUIs, Gaze, and Gesture-based Interaction

https://doi.org/10.1145/3544548.3581279

Word-gesture production models that can synthesize word-gestures are critical to the training and evaluation of word-gesture keyboard decoders. We propose WordGesture-GAN, a conditional generative adversarial network that takes arbitrary text as input to generate realistic word-gesture movements in both spatial (i.e., $(x,y)$ coordinates of touch points) and temporal (i.e., timestamps of touch points) dimensions. WordGesture-GAN introduces a Variational Auto-Encoder to extract and embed variations of user-drawn gestures into a Gaussian distribution which can be sampled to control variation in generated gestures. Our experiments on a dataset with 38k gesture samples show that WordGesture-GAN outperforms existing gesture production models including the minimum jerk model [37] and the style-transfer GAN [31,32] in generating realistic gestures. Overall, our research demonstrates that the proposed GAN structure can learn variations in user-drawn gestures, and the resulting WordGesture-GAN can generate word-gesture movement and predict the distribution of gestures. WordGesture-GAN can serve as a valuable tool for designing and evaluating gestural input systems.

Stony Brook University, Stony Brook, New York, United States

Stony Brook University, Stony brook, New York, United States

Stony Brook University, Stony Brook, New York, United States

Google, Mountain View, California, United States

Stony Brook University, Stony Brook, New York, United States

https://doi.org/10.1145/3544548.3581096

While user interfaces (UIs) display elements such as images and text in a grid-based layout, UI types differ significantly in the number of elements and how they are displayed. For example, webpage designs rely heavily on images and text, whereas desktop UIs tend to feature numerous small images. To examine how such differences affect the way users look at UIs, we collected and analyzed a large eye-tracking-based dataset, \textit{UEyes} (62 participants and 1,980 UI screenshots), covering four major UI types: webpage, desktop UI, mobile UI, and poster. We analyze its differences in biases related to such factors as color, location, and gaze direction. We also compare state-of-the-art predictive models and propose improvements for better capturing typical tendencies across UI types. Both the dataset and the models are publicly available.

Aalto University, Espoo, Finland

University of Luxembourg, Esch-sur-Alzette, Luxembourg

Nokia Technologies, Espoo, Finland

University of Luxembourg, Esch-sur-Alzette, Luxembourg

Aalto University, Espoo, Finland

Aalto University, Helsinki, Finland

https://doi.org/10.1145/3544548.3581028

A central objective in computational design is that an optimal design is desired which optimizes a performance metric. We explore a different problem class with a computational approach we call relative design acquisition. As a motivational example, consider a user prompted to make a choice using buttons. One button may have a more visually appealing design and hence is visually optimal to steer users to click it more often than the second button. In such a design case, a relative design is acquired of a certain quality with respect to a reference design to guide a user decision. After mathematically formalizing this problem, we report the results of three experiments that demonstrate the approach’s efficacy in generating relative designs in a visual interface preference setting. The relative designs are controllable by a quality factor, which affects both comparative ratings and human decision time between the reference and relative designs.

University of Cambridge, Cambridge, United Kingdom

https://doi.org/10.1145/3544548.3581042

Target selection is a fundamental task in interactive Augmented Reality (AR) systems. Predicting the intended target of selection in such systems can provide users with a smooth, low-friction interaction experience. Our work aims to predict gaze-based target selection in AR headsets with eye and head endpoint distributions, which describe the probability distribution of eye and head 3D orientation when a user triggers a selection input. We first conducted a user study to collect users’ eye and head behavior in a gaze-based pointing selection task with two confirmation mechanisms (air tap and blinking). Based on the study results, we then built two models: a unimodal model using only eye endpoints and a multimodal model using both eye and head endpoints. Results from a second user study showed that the pointing accuracy is improved by approximately 32% after integrating our models into gaze-based selection techniques.

Xi'an Jiaotong-Liverpool University, Suzhou, China

Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, China

University of Melbourne, Melbourne, Victoria, Australia

Xi'an Jiaotong-Liverpool University, Suzhou, China

Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu, China