2. Sound & Music

会議の名前
UIST 2024
SonoHaptics: An Audio-Haptic Cursor for Gaze-Based Object Selection in XR
要旨

We introduce SonoHaptics, an audio-haptic cursor for gaze-based 3D object selection. SonoHaptics addresses challenges around providing accurate visual feedback during gaze-based selection in Extended Reality (XR), e.g., lack of world-locked displays in no- or limited-display smart glasses and visual inconsistencies. To enable users to distinguish objects without visual feedback, SonoHaptics employs the concept of cross-modal correspondence in human perception to map visual features of objects (color, size, position, material) to audio-haptic properties (pitch, amplitude, direction, timbre). We contribute data-driven models for determining cross-modal mappings of visual features to audio and haptic features, and a computational approach to automatically generate audio-haptic feedback for objects in the user's environment. SonoHaptics provides global feedback that is unique to each object in the scene, and local feedback to amplify differences between nearby objects. Our comparative evaluation shows that SonoHaptics enables accurate object identification and selection in a cluttered scene without visual feedback.

著者
Hyunsung Cho
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Naveen Sendhilnathan
Meta, Seattle, Washington, United States
Michael Nebeling
University of Michigan, Ann Arbor, Michigan, United States
Tianyi Wang
Purdue University, West Lafayette, Indiana, United States
Purnima Padmanabhan
Meta reality labs, Burlingame , California, United States
Jonathan Browder
Reality Labs Research, Meta Inc., Redmond, Washington, United States
David Lindlbauer
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Tanya R.. Jonker
Meta Inc., Redmond, Washington, United States
Kashyap Todi
Reality Labs Research, Redmond, Washington, United States
論文URL

https://doi.org/10.1145/3654777.3676384

動画
SonifyAR: Context-Aware Sound Generation in Augmented Reality
要旨

Sound plays a crucial role in enhancing user experience and immersiveness in Augmented Reality (AR). However, current platforms lack support for AR sound authoring due to limited interaction types, challenges in collecting and specifying context information, and difficulty in acquiring matching sound assets. We present SonifyAR, an LLM-based AR sound authoring system that generates context-aware sound effects for AR experiences. SonifyAR expands the current design space of AR sound and implements a Programming by Demonstration (PbD) pipeline to automatically collect contextual information of AR events, including virtual-content-semantics and real-world context. This context information is then processed by a large language model to acquire sound effects with Recommendation, Retrieval, Generation, and Transfer methods. To evaluate the usability and performance of our system, we conducted a user study with eight participants and created five example applications, including an AR-based science experiment, and an assistive application for low-vision AR users.

著者
Xia Su
University of Washington, Seattle, Washington, United States
Jon E.. Froehlich
University of Washington, Seattle, Washington, United States
Eunyee Koh
Adobe Research, San Jose, California, United States
Chang Xiao
Adobe Research, San Jose, California, United States
論文URL

https://doi.org/10.1145/3654777.3676406

動画
Auptimize: Optimal Placement of Spatial Audio Cues for Extended Reality
要旨

Spatial audio in Extended Reality (XR) provides users with better awareness of where virtual elements are placed, and efficiently guides them to events such as notifications, system alerts from different windows, or approaching avatars. Humans, however, are inaccurate in localizing sound cues, especially with multiple sources due to limitations in human auditory perception such as angular discrimination error and front-back confusion. This decreases the efficiency of XR interfaces because users misidentify from which XR element a sound is coming. To address this, we propose Auptimize, a novel computational approach for placing XR sound sources, which mitigates such localization errors by utilizing the ventriloquist effect. Auptimize disentangles the sound source locations from the visual elements and relocates the sound sources to optimal positions for unambiguous identification of sound cues, avoiding errors due to inter-source proximity and front-back confusion. Our evaluation shows that Auptimize decreases spatial audio-based source identification errors compared to playing sound cues at the paired visual-sound locations. We demonstrate the applicability of Auptimize for diverse spatial audio-based interactive XR scenarios.

著者
Hyunsung Cho
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Alexander Wang
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Divya Kartik
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Emily Liying. Xie
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Yukang Yan
University of Rochester, Rochester, New York, United States
David Lindlbauer
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
論文URL

https://doi.org/10.1145/3654777.3676424

動画
EarHover: Mid-Air Gesture Recognition for Hearables Using Sound Leakage Signals
要旨

We introduce EarHover, an innovative system that enables mid-air gesture input for hearables. Mid-air gesture input, which eliminates the need to touch the device and thus helps to keep hands and the device clean, has been known to have high demand based on previous surveys. However, existing mid-air gesture input methods for hearables have been limited to adding cameras or infrared sensors. By focusing on the sound leakage phenomenon unique to hearables, we have realized mid-air gesture recognition using a speaker and an external microphone that are highly compatible with hearables. The signal leaked to the outside of the device due to sound leakage can be measured by an external microphone, which detects the differences in reflection characteristics caused by the hand's speed and shape during mid-air gestures. Among 27 types of gestures, we determined the seven most suitable gestures for EarHover in terms of signal discrimination and user acceptability. We then evaluated the gesture detection and classification performance of two prototype devices (in-ear type/open-ear type) for real-world application scenarios.

受賞
Best Paper
著者
Shunta Suzuki
Keio University, Yokohama, Japan
Takashi Amesaka
Keio University, Yokohama, Japan
Hiroki Watanabe
Hokkaido University, Sapporo, Japan
Buntarou Shizuki
University of Tsukuba, Tsukuba, Ibaraki, Japan
Yuta Sugiura
Keio University, Yokohama, Japan
論文URL

https://doi.org/10.1145/3654777.3676367

動画
Towards Music-Aware Virtual Assistants
要旨

We propose a system for modifying spoken notifications in a manner that is sensitive to the music a user is listening to. Spoken notifications provide convenient access to rich information without the need for a screen. Virtual assistants see prevalent use in hands-free settings such as driving or exercising, activities where users also regularly enjoy listening to music. In such settings, virtual assistants will temporarily mute a user's music to improve intelligibility. However, users may perceive these interruptions as intrusive, negatively impacting their music-listening experience. To address this challenge, we propose the concept of music-aware virtual assistants, where speech notifications are modified to resemble a voice singing in harmony with the user's music. We contribute a system that processes user music and notification text to produce a blended mix, replacing original song lyrics with the notification content. In a user study comparing musical assistants to standard virtual assistants, participants expressed that musical assistants fit better with music, reduced intrusiveness, and provided a more delightful listening experience overall.

著者
Alexander Wang
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
David Lindlbauer
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Chris Donahue
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
論文URL

https://doi.org/10.1145/3654777.3676416

動画