Aware: Intuitive Device Activation Using Prosody for Natural Voice Interactions

要旨

Voice interactive devices often use keyword spotting for device activation. However, this approach suffers from misrecognition of keywords and can respond to keywords not intended for calling the device (e.g., "You can ask Alexa about it."), causing accidental device activations. We propose a method that leverages prosodic features to differentiate calling/not-calling voices (F1 score: 0.869), allowing devices to respond only when called upon to avoid misactivation. As a proof of concept, we built a prototype smart speaker called Aware that allows users to control the device activation by speaking the keyword in specific prosody patterns. These patterns are chosen to represent people's natural calling/not-calling voice, which are uncovered in a study to collect such voices and investigate their prosodic difference. A user study comparing Aware with Amazon Echo shows Aware can activate more correctly (F1 score 0.93 vs. 0.56 ) and is easy to learn and use.

著者
Xinlei Zhang
The University of Tokyo, Tokyo, Japan
Zixiong Su
The University of Tokyo, Tokyo, Japan
Jun Rekimoto
The University of Tokyo, Tokyo, Japan
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3517687

動画

会議: CHI 2022

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2022.acm.org/)

セッション: Multimodality

286–287
5 件の発表
2022-05-04 18:00:00
2022-05-04 19:15:00