We propose SolePoser, a real-time 3D pose estimation system that leverages only a single pair of insole sensors. Unlike conventional methods relying on fixed cameras or bulky wearable sensors, our approach offers minimal and natural setup requirements. The proposed system utilizes pressure and IMU sensors embedded in insoles to capture the body weight's pressure distribution at the feet and its 6 DoF acceleration. This information is used to estimate the 3D full-body joint position by a two-stream transformer network. A novel double-cycle consistency loss and a cross-attention module are further introduced to learn the relationship between 3D foot positions and their pressure distributions. We also introduced two different datasets of sports and daily exercises, offering 908k frames across eight different activities. Our experiments show that our method's performance is on par with top-performing approaches, which utilize more IMUs and even outperform third-person-view camera-based methods in certain scenarios.
https://doi.org/10.1145/3654777.3676418
Walking is a cyclic pattern of alternating footstep strikes, with each pair of steps forming a stride, and a series of strides forming a gait. We conduct a systematic examination of different kinds of intentional variations from a normal gait that could be used as input actions without interrupting overall walking progress. A design space of 22 candidate Gait Gestures is generated by adapting previous standing foot input actions and identifying new actions possible in a walking context. A formative study (n=25) examines movement easiness, social acceptability, and walking compatibility with foot movement logging to calculate temporal and spatial characteristics. Using a categorization of these results, 7 gestures are selected for a wizard-of-oz prototype demonstrating an AR interface controlled by Gait Gestures for ordering food and audio playback while walking. As a technical proof-of-concept, a gait gesture recognizer is developed and tested using the formative study data.
https://doi.org/10.1145/3654777.3676342
In augmented and virtual reality (AR/VR) experiences, a user’s arms and hands can provide a convenient and tactile surface for touch input. Prior work has shown on-body input to have significant speed, accuracy, and ergonomic benefits over in-air interfaces, which are common today. In this work, we demonstrate high accuracy, bare hands (i.e., no special instrumentation of the user) skin input using just an RGB camera, like those already integrated into all modern XR headsets. Our results show this approach can be accurate, and robust across diverse lighting conditions, skin tones, and body motion (e.g., input while walking). Finally, our pipeline also provides rich input metadata including touch force, finger identification, angle of attack, and rotation. We believe these are the requisite technical ingredients to more fully unlock on-skin interfaces that have been well motivated in the HCI literature but have lacked robust and practical methods.
https://doi.org/10.1145/3654777.3676455
There has been a continued trend towards minimizing instrumentation for full-body motion capture, going from specialized rooms and equipment, to arrays of worn sensors and recently sparse inertial pose capture methods. However, as these techniques migrate towards lower-fidelity IMUs on ubiquitous commodity devices, like phones, watches, and earbuds, challenges arise including compromised online performance, temporal consistency, and loss of global translation due to sensor noise and drift. Addressing these challenges, we introduce MobilePoser, a real-time system for full-body pose and global translation estimation using any available subset of IMUs already present in these consumer devices. MobilePoser employs a multi-stage deep neural network for kinematic pose estimation followed by a physics-based motion optimizer, achieving state-of-the-art accuracy while remaining lightweight. We conclude with a series of demonstrative applications to illustrate the unique potential of MobilePoser across a variety of fields, such as health and wellness, gaming, and indoor navigation to name a few.
In whiteboard-based remote communication, the seamless integration of drawn content and hand-screen interactions is essential for an immersive user experience. Previous methods either require bulky device setups for capturing hand gestures or fail to accurately track the hand poses from capacitive images. In this paper, we present a real-time method for precise tracking 3D poses of both hands from capacitive video frames. To this end, we develop a deep neural network to identify hands and infer hand joint positions from capacitive frames, and then recover 3D hand poses from the hand-joint positions via a constrained inverse kinematic solver. Additionally, we design a device setup for capturing high-quality hand-screen interaction data and obtained a more accurate synchronized capacitive video and hand pose dataset. Our method improves the accuracy and stability of 3D hand tracking for capacitive frames while maintaining a compact device setup for remote communication. We validate our scheme design and its superior performance on 3D hand pose tracking and demonstrate the effectiveness of our method in whiteboard-based remote communication.
https://doi.org/10.1145/3654777.3676412
Seams are areas of overlapping fabric formed by stitching two or more pieces of fabric together in the cut-and-sew apparel manufacturing process. In SeamPose, we repurposed seams as capacitive sensors in a shirt for continuous upper-body pose estimation. Compared to previous all-textile motion-capturing garments that place the electrodes on the clothing surface, our solution leverages existing seams inside of a shirt by machine-sewing insulated conductive threads over the seams. The unique invisibilities and placements of the seams afford the sensing shirt to look and wear similarly as a conventional shirt while providing exciting pose-tracking capabilities. To validate this approach, we implemented a proof-of-concept untethered shirt with 8 capacitive sensing seams. With a 12-participant user study, our customized deep-learning pipeline accurately estimates the relative (to the pelvis) upper-body 3D joint positions with a mean per joint position error (MPJPE) of 6.0 cm. SeamPose represents a step towards unobtrusive integration of smart clothing for everyday pose estimation.
https://doi.org/10.1145/3654777.3676341