Sensory Shenanigans: Immersion and Illusions in Mixed Reality

https://doi.org/10.1145/3586183.3606714

Using VR in reclining & lying positions is getting common for users, but upward views caused by posture have to be redirected to be parallel to the ground as when users are standing. This affects users' locomotion performances in VR due to potential physical restrictions, and the visual-vestibular-proprioceptive conflict. This paper is among the first to investigate the suited locomotion methods and how reclining & lying positions and redirection affect them in such conditions. A user-elicitation study was carried out to construct a set of locomotion methods based on users' preferences when they were in different reclining & lying positions. A second study developed user-preferred 'tapping' and 'chair rotating' gestures, by evaluating their performances at various body reclining angles, we measured the general impacts of posture and redirection. The results showed that these methods worked effectively, but exposed some shortcomings, and users performed worst at 45-degree reclining angles. Finally, four upgraded methods were designed and verified to improve the locomotion performances.

Institute of Software, Beijing, China

Beijing University of Technology, Beijing, China

School of Artificial Intelligence, Beijing, China

Beijing Normal University, Beijing, China

College of Artificial Intelligence, Nanjing, China

Institute of Software, Chinese Academy of Sciences, Beijing, China

Institute of software, Chinese Academy of Sciences, Beijing, China

https://doi.org/10.1145/3586183.3606734

In cinematic VR, viewers can only see a limited portion of the scene at any time. As a result, they may miss important events outside their field of view. While there are many techniques which offer spatial guidance (where to look), there has been little work on temporal guidance (when to look). Temporal guidance offers viewers a look-ahead time and allows viewers to plan their head motion for important events. This paper introduces spatiotemporal visual guidance and presents a new widget, RadarVR, which shows both spatial and temporal information of regions of interest (ROIs) in a video. Using RadarVR, we conducted a study to investigate the impact of temporal guidance and explore trade-offs between spatiotemporal and spatial-only visual guidance. Results show spatiotemporal feedback allows users to see a greater percentage of ROIs, with 81% more seen from their initial onset. We discuss design implications for future work in this space.

Stanford University, Stanford, California, United States

Meta, Toronto, Ontario, Canada

University of Toronto, Toronto, Ontario, Canada

Meta, Toronto, Ontario, Canada

https://doi.org/10.1145/3586183.3606738

Everyday, billions of people use footwear for walking, running, or exercise. Of emerging interest are ``smart footwear'', which help users track gait, count steps or even analyse performance. However, such nascent footwear lack fine-grain ground surface context awareness, which could allow them to adapt to the conditions and create usable functions and experiences. Hence, this research aims to recognize the walking surface using a radar sensor embedded in a shoe, enabling ground context-awareness. Using data collected from 23 participants from an in-the-wild setting, we developed several classification models. We show that our model can detect five common terrain types with an accuracy of 80.0\% and further ten terrain types with an accuracy of 66.3\%, while moving. Importantly, it can detect the gait motion types such as `walking', `stepping up', `stepping down', `still', with an accuracy of 90\%. Finally, we present potential use cases and insights for future work based on such ground-aware smart shoes.

Monash University, Melbourne, Victoria, Australia

UNSW, Sydney, Austria

University of New South Wales, Sydney, Australia

CSIRO’s Data61 , Sydney, NSW, Australia

https://doi.org/10.1145/3586183.3606782

Head Related Transfer Functions (HRTFs) play a crucial role in creating immersive spatial audio experiences. However, HRTFs dif- fer significantly from person to person, and traditional methods for estimating personalized HRTFs are expensive, time-consuming, and require specialized equipment. We imagine a world where your personalized HRTF can be determined by capturing data through earbuds in everyday environments. In this paper, we propose a novel approach for deriving personalized HRTFs that only relies on in-the-wild binaural recordings and head tracking data. By ana- lyzing how sounds change as the user rotates their head through different environments with different noise sources, we can accu- rately estimate their personalized HRTF. Our results show that our predicted HRTFs closely match ground-truth HRTFs measured in an anechoic chamber. Furthermore, listening studies demonstrate that our personalized HRTFs significantly improve sound local- ization and reduce front-back confusion in virtual environments. Our approach offers an efficient and accessible method for deriving personalized HRTFs and has the potential to greatly improve spatial audio experiences.

University of Washington, Seattle, Washington, United States

University of Washington , Seattle , Washington, United States

University of Washington, Seattle, Washington, United States

https://doi.org/10.1145/3586183.3606779

Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We introduce semantic hearing, a novel capability for hearable devices that enables them to, in real-time, focus on, or ignore, specific sounds from real-world environments, while also preserving the spatial cues. To achieve this, we make two technical contributions: 1) we present the first neural network that can achieve binaural target sound extraction in the presence of interfering sounds and background noise, and 2) we design a training methodology that allows our system to generalize to real-world use. Results show that our system can operate with 20 sound classes and that our transformer-based network has a runtime of 6.56 ms on a connected smartphone. In-the-wild evaluation with participants in previously unseen indoor and outdoor scenarios shows that our proof-of-concept system can extract the target sounds and generalize to preserve the spatial cues in its binaural output. Project page with code: https://semantichearing.cs.washington.edu

University of Washington, SEATTLE, Washington, United States

University of Washington, Seattle, Washington, United States

Microsoft, Redmond, Washington, United States

university of Washington, Seattle, Washington, United States