M^2Silent: Enabling Multi-user Silent Speech Interactions via Multi-directional Speakers in Shared Spaces

We introduce M^2Silent, which enables multi-user silent speech interactions in shared spaces using multi-directional speakers. Ensuring privacy during interactions with voice-controlled systems presents significant challenges, particularly in environments with multiple individuals, such as libraries, offices, or vehicles. M^2Silent addresses this by allowing users to communicate silently, without producing audible speech, using acoustic sensing integrated into directional speakers. We leverage FMCW signals as audio carriers, simultaneously playing audio and sensing the user's silent speech. To handle the challenge of multiple users interacting simultaneously, we propose time-shifted FMCW signals and blind source separation algorithms, which help isolate and accurately recognize the speech features of each user. We also present a deep-learning model for real-time silent speech recognition. M^2Silent achieves Word Error Rate (WER) of 6.5% and Sequence Error Rate (SER) of 12.8% in multi-user silent speech recognition while maintaining high audio quality, offering a novel solution for privacy-preserving, multi-user silent interactions in shared spaces.

Shanghai Jiao Tong University, Shanghai, China

National University of Singapore, Singapore, Singapore

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, Shanghai, China

University of Electronic Science and Technology of China, Chengdu, Sichuan, China

Shanghai Jiao Tong University, Shanghai, China

10.1145/3706598.3714174

https://dl.acm.org/doi/10.1145/3706598.3714174

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)

G302

7 件の発表

開始日時2025-04-30 18:00:00