SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures

Imagine placing your smartphone on a table in a noisy restaurant and clearly capturing the voices of friends seated around you, or recording a lecturer’s voice with clarity in a reverberant auditorium. We introduce SonicSieve, the first intelligent directional speech extraction system for smartphones using a bio-inspired acoustic microstructure. Our passive design embeds directional cues onto incoming speech without any additional electronics. It attaches to the in-line mic of low-cost wired earphones which can be attached to smartphones. We present an end-to-end neural network that processes the raw audio mixtures in real-time on mobile devices. Our results show that SonicSieve achieves a signal quality improvement of 5.0~dB when focusing on a 30° angular region. Additionally, the performance of our system based on only two microphones exceeds that of conventional 5-microphone arrays.

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

Tsinghua University, Beijing, China

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

CMU, Pittsburgh, Pennsylvania, United States

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

ACM CHI Conference on Human Factors in Computing Systems

P1 - Room 133

7 件の発表

開始日時2026-04-17 20:15:00

終了日時2026-04-17 21:45:00

お気に入り

あとで読む