2. Poses as Input

https://doi.org/10.1145/3654777.3676342

Walking is a cyclic pattern of alternating footstep strikes, with each pair of steps forming a stride, and a series of strides forming a gait. We conduct a systematic examination of different kinds of intentional variations from a normal gait that could be used as input actions without interrupting overall walking progress. A design space of 22 candidate Gait Gestures is generated by adapting previous standing foot input actions and identifying new actions possible in a walking context. A formative study (n=25) examines movement easiness, social acceptability, and walking compatibility with foot movement logging to calculate temporal and spatial characteristics. Using a categorization of these results, 7 gestures are selected for a wizard-of-oz prototype demonstrating an AR interface controlled by Gait Gestures for ordering food and audio playback while walking. As a technical proof-of-concept, a gait gesture recognizer is developed and tested using the formative study data.

National Taiwan University, Taipei, Taiwan

University of Waterloo, Waterloo, Ontario, Canada

https://doi.org/10.1145/3654777.3676455

In augmented and virtual reality (AR/VR) experiences, a user’s arms and hands can provide a convenient and tactile surface for touch input. Prior work has shown on-body input to have significant speed, accuracy, and ergonomic benefits over in-air interfaces, which are common today. In this work, we demonstrate high accuracy, bare hands (i.e., no special instrumentation of the user) skin input using just an RGB camera, like those already integrated into all modern XR headsets. Our results show this approach can be accurate, and robust across diverse lighting conditions, skin tones, and body motion (e.g., input while walking). Finally, our pipeline also provides rich input metadata including touch force, finger identification, angle of attack, and rotation. We believe these are the requisite technical ingredients to more fully unlock on-skin interfaces that have been well motivated in the HCI literature but have lacked robust and practical methods.

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

https://doi.org/10.1145/3654777.3676461

There has been a continued trend towards minimizing instrumentation for full-body motion capture, going from specialized rooms and equipment, to arrays of worn sensors and recently sparse inertial pose capture methods. However, as these techniques migrate towards lower-fidelity IMUs on ubiquitous commodity devices, like phones, watches, and earbuds, challenges arise including compromised online performance, temporal consistency, and loss of global translation due to sensor noise and drift. Addressing these challenges, we introduce MobilePoser, a real-time system for full-body pose and global translation estimation using any available subset of IMUs already present in these consumer devices. MobilePoser employs a multi-stage deep neural network for kinematic pose estimation followed by a physics-based motion optimizer, achieving state-of-the-art accuracy while remaining lightweight. We conclude with a series of demonstrative applications to illustrate the unique potential of MobilePoser across a variety of fields, such as health and wellness, gaming, and indoor navigation to name a few.

University of Chicago, Chicago, Illinois, United States

Northwestern University, Evanston, Illinois, United States

https://doi.org/10.1145/3654777.3676412

In whiteboard-based remote communication, the seamless integration of drawn content and hand-screen interactions is essential for an immersive user experience. Previous methods either require bulky device setups for capturing hand gestures or fail to accurately track the hand poses from capacitive images. In this paper, we present a real-time method for precise tracking 3D poses of both hands from capacitive video frames. To this end, we develop a deep neural network to identify hands and infer hand joint positions from capacitive frames, and then recover 3D hand poses from the hand-joint positions via a constrained inverse kinematic solver. Additionally, we design a device setup for capturing high-quality hand-screen interaction data and obtained a more accurate synchronized capacitive video and hand pose dataset. Our method improves the accuracy and stability of 3D hand tracking for capacitive frames while maintaining a compact device setup for remote communication. We validate our scheme design and its superior performance on 3D hand pose tracking and demonstrate the effectiveness of our method in whiteboard-based remote communication.

University of California, San Diego, San Diego, California, United States

Microsoft Research Asia, Beijing, China