Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables

要旨

Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We introduce semantic hearing, a novel capability for hearable devices that enables them to, in real-time, focus on, or ignore, specific sounds from real-world environments, while also preserving the spatial cues. To achieve this, we make two technical contributions: 1) we present the first neural network that can achieve binaural target sound extraction in the presence of interfering sounds and background noise, and 2) we design a training methodology that allows our system to generalize to real-world use. Results show that our system can operate with 20 sound classes and that our transformer-based network has a runtime of 6.56 ms on a connected smartphone. In-the-wild evaluation with participants in previously unseen indoor and outdoor scenarios shows that our proof-of-concept system can extract the target sounds and generalize to preserve the spatial cues in its binaural output. Project page with code: https://semantichearing.cs.washington.edu

著者
Bandhav Veluri
University of Washington, SEATTLE, Washington, United States
Malek Itani
University of Washington, Seattle, Washington, United States
Justin Chan
University of Washington, Seattle, Washington, United States
Takuya Yoshioka
Microsoft, Redmond, Washington, United States
Shyamnath Gollakota
university of Washington, Seattle, Washington, United States
論文URL

https://doi.org/10.1145/3586183.3606779

動画

会議: UIST 2023

ACM Symposium on User Interface Software and Technology

セッション: Sensory Shenanigans: Immersion and Illusions in Mixed Reality

Venetian Room
6 件の発表
2023-11-01 18:00:00
2023-11-01 19:20:00