Soundr: Head Position and Orientation Prediction Using a Microphone Array

要旨

Although state-of-the-art smart speakers can hear a user's speech, unlike a human assistant these devices cannot figure out users' verbal references based on their head location and orientation. Soundr presents a novel interaction technique that leverages the built-in microphone array found in most smart speakers to infer the user's spatial location and head orientation using only their voice. With that extra information, Soundr can figure out users references to objects, people, and locations based on the speakers' gaze, and also provide relative directions. To provide training data for our neural network, we collected 751 minutes of data (50x that of the best prior work) from human speakers leveraging a virtual reality headset to accurately provide head tracking ground truth. Our results achieve an average positional error of 0.31m and an orientation angle accuracy of 34.3° for each voice command. A user study to evaluate user preferences for controlling IoT appliances by talking at them found this new approach to be fast and easy to use.

キーワード
Smart speakers
internet of Things
machine learning
acoustic source localization
著者
Jackie (Junrui) Yang
Stanford University, Stanford, CA, USA
Gaurab Banerjee
Stanford University, Stanford, CA, USA
Vishesh Gupta
Stanford University, Stanford, CA, USA
Monica S. Lam
Stanford University, Stanford, CA, USA
James A. Landay
Stanford University, Stanford, CA, USA
DOI

10.1145/3313831.3376427

論文URL

https://doi.org/10.1145/3313831.3376427

動画

会議: CHI 2020

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2020.acm.org/)

セッション: Use your head & run

Paper session
314 LANA'I
5 件の発表
2020-04-30 01:00:00
2020-04-30 02:15:00
日本語まとめ
読み込み中…