Computational Physical Interaction

https://doi.org/10.1145/3411764.3445626

To make off-screen interaction without specialized hardware practical, we investigate using deep learning methods to process the common built-in IMU sensor (accelerometers and gyroscopes) on mobile phones into a useful set of one-handed interaction events. We present the design, training, implementation and applications of TapNet, a multi-task network that detects tapping on the smartphone. With phone form factor as auxiliary information, TapNet can jointly learn from data across devices and simultaneously recognize multiple tap properties, including tap direction and tap location. We developed two datasets consisting of over 135K training samples, 38K testing samples, and 32 participants in total. Experimental evaluation demonstrated the effectiveness of the TapNet design and its significant improvement over the state of the art. Along with the datasets, codebase, and extensive experiments, TapNet establishes a new technical foundation for off-screen mobile input.

Google, Mountain View, California, United States

Google Research, Mountain View, California, United States

Google, Mountain View, California, United States

10.1145/3411764.3445626

https://doi.org/10.1145/3411764.3445106

Typing on wearables while situationally impaired, such as while walking, is challenging. However, while HCI research on wearable typing is diverse, existing work focuses on stationary scenarios and fine-grained input that will likely perform poorly when users are on-the-go. To address this issue we explore single-handed wearable typing using inter-hand touches between the thumb and fingers, a modality we argue will be robust to the physical disturbances inherent to input while mobile. We first examine the impact of walking on performance of these touches, noting no significant differences in accuracy or speed, then feed our study data into a multi-objective optimization process in order to design keyboard layouts (for both five and ten keys) capable of supporting rapid, accurate, comfortable, and unambiguous typing. A final study tests these layouts against QWERTY baselines and reports performance improvements of up to 10.45% WPM and 39.44% WER when users type while walking.

UNIST, Ulsan, Korea, Republic of

10.1145/3411764.3445106

https://doi.org/10.1145/3411764.3445631

We present a computational approach to haptic design embedded in everyday tangible interaction with digital fabrication. To generate haptic feedback, the use of permanent magnets as the mechanism potentially contributes to simpleness and robustness; however, it is difficult to manually design how magnets should be embedded in the objects. Our approach enables the inverse design of magnetic force feedback; that is, we computationally solve an inverse problem to obtain an optimal arrangement of permanent magnets that renders the user-specified haptic sensation. To solve the inverse problem in a practical manner, we also present techniques on magnetic simulation and optimization. We demonstrate applications to explore the design possibility of augmenting digital fabrication for everyday use.

National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan

National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan

10.1145/3411764.3445631

https://doi.org/10.1145/3411764.3445184

Motion correlation interfaces are those that present targets moving in different patterns, which the user can select by matching their motion. In this paper, we re-formulate the task of target selection as a probabilistic inference problem. We demonstrate that previous interaction techniques can be modelled using a Bayesian approach and that how modelling the selection task as transmission of information can help us make explicit the assumptions behind similarity measures. We propose ways of incorporating uncertainty into the decision-making process and demonstrate how the concept of entropy can illuminate the measurement of the quality of a design. We apply these techniques in a case study and suggest guidelines for future work.

University of Melbourne, Melbourne, Victoria, Australia

University of São Paulo, São Paulo, SP, Brazil

10.1145/3411764.3445184

https://doi.org/10.1145/3411764.3445514

We present a novel simulation model of point-and-click behaviour that is applicable both when a target is stationary or moving. To enable more realistic simulation than existing models, the model proposed in this study takes into account key features of the user and the external environment, such as intermittent motor control, click decision-making, visual perception, upper limb kinematics and the effect of input device. The simulated user's point-and-click behaviour is formulated as a Markov decision process (MDP), and the user's policy of action is optimised through deep reinforcement learning. As a result, our model successfully and accurately reproduced the trial completion time, distribution of click endpoints, and cursor trajectories of real users. Through an ablation study, we showed how the simulation results change when the model's sub-modules are individually removed. The implemented model and dataset are publicly available.

KAIST, DAEJEON, Korea, Republic of

KAIST, Daejeon, Korea, Republic of

Yonsei University, Seoul, Korea, Republic of

10.1145/3411764.3445514

https://doi.org/10.1145/3411764.3445671

Typing with ten fingers on a virtual keyboard in virtual or augmented reality exposes a challenging input interpretation problem. There are many sources of noise in this interaction context and these exacerbate the challenge of accurately translating human actions into text. A particularly challenging input noise source arises from the physiology of the hand. Intentional finger movements can produce unintentional coactivations in other fingers. On a physical keyboard, the resistance of the keys alleviates this issue. On a virtual keyboard, coactivations are likely to introduce spurious input events under a naïve solution to input detection. In this paper we examine the features that discriminate intentional activations from coactivations. Based on this analysis, we demonstrate three alternative coactivation detection strategies with high discrimination power. Finally, we integrate coactivation detection into a probabilistic decoder and demonstrate its ability to further reduce uncorrected character error rates by approximately 10% relative and 0.9% absolute.

University of Cambridge, Cambridge, United Kingdom

Facebook Inc, Redmond, Washington, United States

Facebook, Redmond, Washington, United States

University of Cambridge, Cambridge, United Kingdom

10.1145/3411764.3445671

https://doi.org/10.1145/3411764.3445177

Gaze-based selection has received significant academic attention over a number of years. While advances have been made, it is possible that further progress could be made if there were a deeper understanding of the adaptive nature of the mechanisms that guide eye movement and vision. Control of eye movement typically results in a sequence of movements (saccades) and fixations followed by a ‘dwell’ at a target and a selection. To shed light on how these sequences are planned, this paper presents a computational model of the control of eye movements in gaze-based selection. We formulate the model as an optimal sequential planning problem bounded by the limits of the human visual and motor systems and use reinforcement learning to approximate optimal solutions. The model accurately replicates earlier results on the effects of target size and distance and captures a number of other aspects of performance. The model can be used to predict number of fixations and duration required to make a gaze-based selection. The future development of the model is discussed.

Aalto University, Espoo, Finland

Aalto University, Espoo, Espoo, Finland

Aalto University, Espoo, Finland

10.1145/3411764.3445177

https://doi.org/10.1145/3411764.3445621

Touch input is dominantly detected using mutual-capacitance sensing, which measures the proximity of close-by objects that change the electric field between the sensor lines. The exponential drop-off in intensities with growing distance enables software to detect touch events, but does not reveal true contact areas. In this paper, we introduce CapContact, a novel method to precisely infer the contact area between the user's finger and the surface from a single capacitive image. At 8x super-resolution, our convolutional neural network generates refined touch masks from 16-bit capacitive images as input, which can even discriminate adjacent touches that are not distinguishable with existing methods. We trained and evaluated our method using supervised learning on data from 10 participants who performed touch gestures. Our capture apparatus integrates optical touch sensing to obtain ground-truth contact through high-resolution frustrated total internal reflection. We compare our method with a baseline using bicubic upsampling as well as the ground truth from FTIR images. We separately evaluate our method's performance in discriminating adjacent touches. CapContact successfully separated closely adjacent touch contacts in 494 of 570 cases (87%) compared to the baseline's 43 of 570 cases (8%). Importantly, we demonstrate that our method accurately performs even at half of the sensing resolution at twice the grid-line pitch across the same surface area, challenging the current industry-wide standard of a ~4mm sensing pitch. We conclude this paper with implications for capacitive touch sensing in general and for touch-input accuracy in particular.

ETH Zürich, Zürich, Switzerland

10.1145/3411764.3445621

https://doi.org/10.1145/3411764.3445349

Arm discomfort is a common issue in Cross Reality applications involving prolonged mid-air interaction. Solving this problem is difficult because of the lack of tools and guidelines for 3D user interface design. Therefore, we propose a method to make existing ergonomic metrics available to creators during design by estimating the interaction cost at each reachable position in the user's environment. We present XRgonomics, a toolkit to visualize the interaction cost and make it available at runtime, allowing creators to identify UI positions that optimize users' comfort. Two scenarios show how the toolkit can support 3D UI design and dynamic adaptation of UIs based on spatial constraints. We present results from a walkthrough demonstration, which highlight the potential of XRgonomics to make ergonomics metrics accessible during the design and development of 3D UIs. Finally, we discuss how the toolkit may address design goals beyond ergonomics.

Aarhus University, Aarhus, Denmark

ETH Zurich, Zurich, Switzerland

Aarhus University, Aarhus, Denmark

10.1145/3411764.3445349

https://doi.org/10.1145/3411764.3445766

Hand gestures are a natural and expressive input method enabled by modern mixed reality headsets. However, it remains challenging for developers to create custom gestures for their applications. Conventional strategies to bespoke gesture recognition involve either hand-crafting or data-intensive deep-learning. Neither approach is well suited for rapid prototyping of new interactions. This paper introduces a flexible and efficient alternative approach for constructing hand gestures. We present Gesture Knitter: a design tool for creating custom gesture recognizers with minimal training data. Gesture Knitter allows the specification of gesture primitives that can then be combined to create more complex gestures using a visual declarative script. Designers can build custom recognizers by declaring them from scratch or by providing a demonstration that is automatically decoded into its primitive components. Our developer study shows that Gesture Knitter achieves high recognition accuracy despite minimal training data and delivers an expressive and creative design experience.

University of Cambridge, Cambridge, United Kingdom

10.1145/3411764.3445766

https://doi.org/10.1145/3411764.3445138

Millimeter wave (mmWave) Doppler radar is a new and promising sensing approach for human activity recognition, offering signal richness approaching that of microphones and cameras, but without many of the privacy-invading downsides. However, unlike audio and computer vision approaches that can draw from huge libraries of videos for training deep learning models, Doppler radar has no existing large datasets, holding back this otherwise promising sensing modality. In response, we set out to create a software pipeline that converts videos of human activities into realistic, synthetic Doppler radar data. We show how this cross-domain translation can be successful through a series of experimental results. Overall, we believe our approach is an important stepping stone towards significantly reducing the burden of training such as human sensing systems, and could help bootstrap uses in human-computer interaction.

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

Max Planck Institute for Informatics, Saarbrücken, Germany

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

10.1145/3411764.3445138