Eye and Face

会議の名前
CHI 2024
EyeEcho: Continuous and Low-power Facial Expression Tracking on Glasses
要旨

In this paper, we introduce EyeEcho, a minimally-obtrusive acoustic sensing system designed to enable glasses to continuously monitor facial expressions. It utilizes two pairs of speakers and microphones mounted on glasses, to emit encoded inaudible acoustic signals directed towards the face, capturing subtle skin deformations associated with facial expressions. The reflected signals are processed through a customized machine-learning pipeline to estimate full facial movements. EyeEcho samples at 83.3 Hz with a relatively low power consumption of 167 mW. Our user study involving 12 participants demonstrates that, with just four minutes of training data, EyeEcho achieves highly accurate tracking performance across different real-world scenarios, including sitting, walking, and after remounting the devices. Additionally, a semi-in-the-wild study involving 10 participants further validates EyeEcho's performance in naturalistic scenarios while participants engage in various daily activities. Finally, we showcase EyeEcho's potential to be deployed on a commercial-off-the-shelf (COTS) smartphone, offering real-time facial expression tracking.

著者
Ke Li
Cornell University, Ithaca, New York, United States
Ruidong Zhang
Cornell University, Ithaca, New York, United States
Siyuan Chen
Cornell University, Ithaca, New York, United States
Boao Chen
Cornell University, Ithaca, New York, United States
Mose Sakashita
Cornell University, Ithaca, New York, United States
Francois Guimbretiere
Cornell University, Ithaca, New York, United States
Cheng Zhang
Cornell , Ithaca, New York, United States
論文URL

doi.org/10.1145/3613904.3642613

動画
Uncovering and Addressing Blink-Related Challenges in Using Eye Tracking for Interactive Systems
要旨

Currently, interactive systems use physiological sensing to enable advanced functionalities. While eye tracking is a promising means to understand the user, eye tracking data inherently suffers from missing data due to blinks, which may result in reduced system performance. We conducted a literature review to understand how researchers deal with this issue. We uncovered that researchers often implemented their use-case-specific pipeline to overcome the issue, ranging from ignoring missing data to artificial interpolation. With these first insights, we run a large-scale analysis on 11 publicly available datasets to understand the impact of the various approaches on data quality and accuracy. By this, we highlight the pitfalls in data processing and which methods work best. Based on our results, we provide guidelines for handling eye tracking data for interactive systems. Further, we propose a standard data processing pipeline that allows researchers and practitioners to pre-process and standardize their data efficiently.

著者
Jesse W. Grootjen
LMU Munich, Munich, Germany
Henrike Weingärtner
LMU Munich, Munich , Germany
Sven Mayer
LMU Munich, Munich, Germany
論文URL

doi.org/10.1145/3613904.3642086

動画
MELDER: The Design and Evaluation of a Real-time Silent Speech Recognizer for Mobile Devices
要旨

Silent speech is unaffected by ambient noise, increases accessibility, and enhances privacy and security. Yet current silent speech recognizers operate in a phrase-in/phrase-out manner, thus are slow, error prone, and impractical for mobile devices. We present MELDER, a Mobile Lip Reader that operates in real-time by splitting the input video into smaller temporal segments to process them individually. An experiment revealed that this substantially improves computation time, making it suitable for mobile devices. We further optimize the model for everyday use by exploiting the knowledge from a high-resource vocabulary using a transfer learning model. We then compare MELDER in both stationary and mobile settings with two state-of-the-art silent speech recognizers, where MELDER demonstrated superior overall performance. Finally, we compare two visual feedback methods of MELDER with the visual feedback method of Google Assistant. The outcomes shed light on how these proposed feedback methods influence users' perceptions of the model's performance.

著者
Laxmi Pandey
University of California, Merced, Merced, California, United States
Ahmed Sabbir. Arif
University of California, Merced, Merced, California, United States
論文URL

doi.org/10.1145/3613904.3642348

動画
ReHEarSSE: Recognizing Hidden-in-the-Ear Silently Spelled Expressions
要旨

Silent speech interaction (SSI) allows users to discreetly input text without using their hands. Existing wearable SSI systems typically require custom devices and are limited to a small lexicon, limiting their utility to a small set of command words. This work proposes ReHearSSE, an earbud-based ultrasonic SSI system capable of generalizing to words that do not appear in its training dataset, providing support for nearly an entire dictionary's worth of words. As a user silently spells words, ReHearSSE uses autoregressive features to identify subtle changes in ear canal shape. ReHearSSE infers words using a deep learning model trained to optimize connectionist temporal classification (CTC) loss with an intermediate embedding that accounts for different letters and transitions between them. We find that ReHearSSE recognizes 100 unseen words with an accuracy of 89.3%.

著者
Xuefu Dong
The University of Tokyo, Tokyo, Japan
Yifei Chen
Tsinghua University, Beijing, China
Yuuki Nishiyama
The University of Tokyo, Tokyo, Japan
Kaoru Sezaki
The University of Tokyo, Tokyo, Japan
Yuntao Wang
Tsinghua University, Beijing, China
Ken Christofferson
University of Toronto, Toronto, Ontario, Canada
Alex Mariakakis
University of Toronto, Toronto, Ontario, Canada
論文URL

doi.org/10.1145/3613904.3642095

動画
Watch Your Mouth: Silent Speech Recognition with Depth Sensing
要旨

Silent speech recognition is a promising technology that decodes human speech without requiring audio signals, enabling private human-computer interactions. In this paper, we propose Watch Your Mouth, a novel method that leverages depth sensing to enable accurate silent speech recognition. By leveraging depth information, our method provides unique resilience against environmental factors such as variations in lighting and device orientations, while further addressing privacy concerns by eliminating the need for sensitive RGB data. We started by building a deep-learning model that locates lips using depth data. We then designed a deep learning pipeline to efficiently learn from point clouds and translate lip movements into commands and sentences. We evaluated our technique and found it effective across diverse sensor locations: On-Head, On-Wrist, and In-Environment. Watch Your Mouth outperformed the state-of-the-art RGB-based method, demonstrating its potential as an accurate and reliable input technique.

受賞
Honorable Mention
著者
Xue Wang
University of California, Los Angeles, Los Angeles, California, United States
Zixiong Su
The University of Tokyo, Tokyo, Japan
Jun Rekimoto
The University of Tokyo, Tokyo, Japan
Yang Zhang
University of California, Los Angeles, Los Angeles, California, United States
論文URL

doi.org/10.1145/3613904.3642092

動画