Manipulating their environment is one of the fundamental actions that humans, and actors more generally, perform. Yet, today's mixed reality systems enable us to situate virtual content in the physical scene but fall short of expanding the visual illusion to believable environment manipulations. In this paper, we present the concept and system of Scene Responsiveness, the visual illusion that virtual actions affect the physical scene. Using co-aligned digital twins for coherence-preserving just-in-time virtualization of physical objects in the environment, Scene Responsiveness allows actors to seemingly manipulate physical objects as if they were virtual. Based on Scene Responsiveness, we propose two general types of end to-end illusionary experiences that ensure visuotactile consistency through the presented techniques of object elusiveness and object rephysicalization. We demonstrate how our Daydreaming illusion enables virtual characters to enter the scene through a physically closed door and vandalize the physical scene, or users to enchant and summon far-away physical objects. In a user evaluation of our Copperfield illusion, we found that Scene Responsiveness can be rendered so convincingly that it lends itself to magic tricks. We present our system architecture and conclude by discussing the implications of scene-responsive mixed reality for gaming and telepresence.
https://doi.org/10.1145/3586183.3606825
Using VR in reclining & lying positions is getting common for users, but upward views caused by posture have to be redirected to be parallel to the ground as when users are standing. This affects users' locomotion performances in VR due to potential physical restrictions, and the visual-vestibular-proprioceptive conflict. This paper is among the first to investigate the suited locomotion methods and how reclining & lying positions and redirection affect them in such conditions. A user-elicitation study was carried out to construct a set of locomotion methods based on users' preferences when they were in different reclining & lying positions. A second study developed user-preferred 'tapping' and 'chair rotating' gestures, by evaluating their performances at various body reclining angles, we measured the general impacts of posture and redirection. The results showed that these methods worked effectively, but exposed some shortcomings, and users performed worst at 45-degree reclining angles. Finally, four upgraded methods were designed and verified to improve the locomotion performances.
https://doi.org/10.1145/3586183.3606714
In cinematic VR, viewers can only see a limited portion of the scene at any time. As a result, they may miss important events outside their field of view. While there are many techniques which offer spatial guidance (where to look), there has been little work on temporal guidance (when to look). Temporal guidance offers viewers a look-ahead time and allows viewers to plan their head motion for important events. This paper introduces spatiotemporal visual guidance and presents a new widget, RadarVR, which shows both spatial and temporal information of regions of interest (ROIs) in a video. Using RadarVR, we conducted a study to investigate the impact of temporal guidance and explore trade-offs between spatiotemporal and spatial-only visual guidance. Results show spatiotemporal feedback allows users to see a greater percentage of ROIs, with 81% more seen from their initial onset. We discuss design implications for future work in this space.
https://doi.org/10.1145/3586183.3606734
Everyday, billions of people use footwear for walking, running, or exercise. Of emerging interest are ``smart footwear'', which help users track gait, count steps or even analyse performance. However, such nascent footwear lack fine-grain ground surface context awareness, which could allow them to adapt to the conditions and create usable functions and experiences. Hence, this research aims to recognize the walking surface using a radar sensor embedded in a shoe, enabling ground context-awareness. Using data collected from 23 participants from an in-the-wild setting, we developed several classification models. We show that our model can detect five common terrain types with an accuracy of 80.0\% and further ten terrain types with an accuracy of 66.3\%, while moving. Importantly, it can detect the gait motion types such as `walking', `stepping up', `stepping down', `still', with an accuracy of 90\%. Finally, we present potential use cases and insights for future work based on such ground-aware smart shoes.
https://doi.org/10.1145/3586183.3606738
Head Related Transfer Functions (HRTFs) play a crucial role in creating immersive spatial audio experiences. However, HRTFs dif- fer significantly from person to person, and traditional methods for estimating personalized HRTFs are expensive, time-consuming, and require specialized equipment. We imagine a world where your personalized HRTF can be determined by capturing data through earbuds in everyday environments. In this paper, we propose a novel approach for deriving personalized HRTFs that only relies on in-the-wild binaural recordings and head tracking data. By ana- lyzing how sounds change as the user rotates their head through different environments with different noise sources, we can accu- rately estimate their personalized HRTF. Our results show that our predicted HRTFs closely match ground-truth HRTFs measured in an anechoic chamber. Furthermore, listening studies demonstrate that our personalized HRTFs significantly improve sound local- ization and reduce front-back confusion in virtual environments. Our approach offers an efficient and accessible method for deriving personalized HRTFs and has the potential to greatly improve spatial audio experiences.
https://doi.org/10.1145/3586183.3606782
Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We introduce semantic hearing, a novel capability for hearable devices that enables them to, in real-time, focus on, or ignore, specific sounds from real-world environments, while also preserving the spatial cues. To achieve this, we make two technical contributions: 1) we present the first neural network that can achieve binaural target sound extraction in the presence of interfering sounds and background noise, and 2) we design a training methodology that allows our system to generalize to real-world use. Results show that our system can operate with 20 sound classes and that our transformer-based network has a runtime of 6.56 ms on a connected smartphone. In-the-wild evaluation with participants in previously unseen indoor and outdoor scenarios shows that our proof-of-concept system can extract the target sounds and generalize to preserve the spatial cues in its binaural output. Project page with code: https://semantichearing.cs.washington.edu
https://doi.org/10.1145/3586183.3606779