Aware: Intuitive Device Activation Using Prosody for Natural Voice Interactions

Voice interactive devices often use keyword spotting for device activation. However, this approach suffers from misrecognition of keywords and can respond to keywords not intended for calling the device (e.g., "You can ask Alexa about it."), causing accidental device activations. We propose a method that leverages prosodic features to differentiate calling/not-calling voices (F1 score: 0.869), allowing devices to respond only when called upon to avoid misactivation. As a proof of concept, we built a prototype smart speaker called Aware that allows users to control the device activation by speaking the keyword in specific prosody patterns. These patterns are chosen to represent people's natural calling/not-calling voice, which are uncovered in a study to collect such voices and investigate their prosodic difference. A user study comparing Aware with Amazon Echo shows Aware can activate more correctly (F1 score 0.93 vs. 0.56 ) and is easy to learn and use.

The University of Tokyo, Tokyo, Japan

https://dl.acm.org/doi/abs/10.1145/3491102.3517687

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2022.acm.org/)

286–287

5 件の発表

開始日時2022-05-04 18:00:00

終了日時2022-05-04 19:15:00

お気に入り

あとで読む

コレクション

要旨

著者

論文URL

動画

会議: CHI 2022

セッション: Multimodality