Wearable, Audio and Novel Interactive Devices

会議の名前
CHI 2026
VueBuds: Visual Intelligence with Wireless Earbuds
要旨

Despite their ubiquity, wireless earbuds remain audio-centric due to size and power constraints. We present VueBuds, the first camera-integrated wireless earbuds for egocentric vision, capable of operating within stringent power and form-factor limits. Each VueBud embeds a camera into a Sony WF-1000XM3 to stream visual data over Bluetooth to a host device for on-device vision language model (VLM) processing. We show analytically and empirically that while each camera's field of view is partially occluded by the face, the combined binocular perspective provides comprehensive forward coverage. By integrating VueBuds with VLMs, we build an end-to-end system for real-time scene understanding, translation, visual reasoning, and text reading; all from low-resolution monochrome cameras drawing under 5mW through on-demand activation. Through online and in-person user studies with 90 participants, we compare VueBuds against smart glasses across 17 visual question-answering tasks, and show that our system achieves response quality on par with Ray-Ban Meta. Our work establishes low-power camera-equipped earbuds as a compelling platform for visual intelligence, bringing rapidly advancing VLM capabilities to one of the most ubiquitous wearable form factors.

受賞
Honorable Mention
著者
Maruchi Kim
University of Washington, Seattle, Washington, United States
Rasya Fawwaz
University of Washington, Seattle, Washington, United States
Zhi Yang Lim
University of Washington, Seattle, Washington, United States
Brinda Moudgalya
University of Washington, Seattle, Washington, United States
Hexi Wang
University of Washington, Seattle, Washington, United States
Yuanhao Zeng
University of Washington, Seattle, Washington, United States
Shyamnath Gollakota
University of Washington, Seattle, Washington, United States
Auditorily Embodied Conversational Agents: Effects of Spatialization and Situated Audio Cues on Presence and Social Perception
要旨

Embodiment can enhance conversational agents, such as increasing their perceived presence. This is typically achieved through visual representations of a virtual body; however, visual modalities are not always available, such as when users interact with agents using headphones or display-less glasses. In this work, we explore auditory embodiment. By introducing auditory cues of bodily presence — through spatially localized voice and situated Foley audio from environmental interactions — we investigate how audio alone can convey embodiment and influence perceptions of a conversational agent. We conducted a 2 (spatialization: monaural vs. spatialized) × 2 (Foley: none vs. Foley) within-subjects study, where participants (n=24) engaged in conversations with agents. Our results show that spatialization and Foley increase co-presence, but reduce users’ perceptions of the agent’s attention and other social attributes.

著者
Yi Fei Cheng
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Jarod Bloch
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Alexander Wang
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Andrea Bianchi
KAIST, Daejeon, Korea, Republic of
Anusha Withana
The University of Sydney, Sydney, NSW, Australia
Anhong Guo
University of Michigan, Ann Arbor, Michigan, United States
Laurie M.. Heller
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
David Lindlbauer
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
SoundBubble: Finger-Bound Virtual Microphone using Headset/Glasses Beamforming
要旨

Hands are the chief appendage with which we manipulate the world around us, creating sounds as they go. As such, they are a rich source of information that computers can leverage for input and context sensing. Indeed, many prior works in HCI have explored this idea by instrumenting users' hands with a microphone, often integrated into a ring, wristband, or watch. In this work, we explore an alternative bare-hands approach --- by using a microphone array integrated into a user's headset/glasses, we can use beamforming to create a virtual microphone that tracks with the user's fingers in 3D space. We show this method can capture even the subtle noise of a finger translating across surfaces, including skin-to-skin contact for micro-gestures, as well as passive widget interactions.

著者
Daehwa Kim
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
Chris Harrison
Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
StreamFog: Using ultrasonic streaming for controlling aerosols in fog-based displays
要旨

Fog screens are a projection medium for mid-air graphics, they are see-through and reach-through. Fog screens are traditionally generated by laminar flows coming out of linear outlets and thus are usually static. Here, we show that ultrasonic beams create a streaming field of constrained airflow that direct aerosols in a fast and controllable way. We characterize various aerosols under streaming and optical diffusion, and evaluate control methods for obtaining different aerosol shapes. Aerosol columns are raised, moved and pushed in a few hundred milliseconds to serve as a projection medium. Hands and other objects do not disturb the fog columns significantly, allowing for direct interaction. We showcase applications by projecting on multiple columns that can be moved continuously within the display volume, reach-through interactions and mixed-reality for tabletop games.

著者
Unai Javier Fernández
Universidad Pública de Navarra, Pamplona, Spain
Ivan Fernández
Universidad Pública de Navarra, Pamplona, Navarra, Spain
Josu Irisarri
Universidad Pública de Navarra, Pamplona, Navarra, Spain
Iñigo Ezcurdia
Public University of Navarra, Pamplona, Spain
Asier Marzo
Universidad Publica de Navarra, Pamplona, Navarre, Spain
NasoVoce: A Nose-Mounted Low-Audibility Speech Interface for Always-Available Speech Interaction
要旨

Silent and whispered speech offer promise for always-available voice interaction with AI, yet existing methods struggle to balance vocabulary size, wearability, silence, and noise robustness. We present NasoVoce, a nose-bridge–mounted interface that integrates a microphone and a vibration sensor. Positioned at the nasal pads of smart glasses, it unobtrusively captures both acoustic and vibration signals. The nasal bridge, close to the mouth, allows access to bone- and skin-conducted speech and enables reliable capture of low-volume utterances such as whispered speech. While the microphone captures high-quality audio, it is highly sensitive to environmental noise. Conversely, the vibration sensor is robust to noise but yields lower signal quality. By fusing these complementary inputs, NasoVoce generates high-quality speech robust against interference. Evaluation with Whisper Large-v2, PESQ, STOI, and MUSHRA ratings confirms improved recognition and quality. NasoVoce demonstrates the feasibility of a practical interface for always-available, continuous, and discreet AI voice conversations.

著者
Jun Rekimoto
Sony Computer Science Laboratories, Kyoto, Kyoto, Kyoto, Japan
Yu Nishimura
Sony Computer Science Laboratories, Tokyo, Japan
Bojian Yang
Sony Computer Science Laboratories, Tokyo, Japan
Privacy & Safety Challenges of On-Body Interaction Techniques
要旨

On-body computing systems offer new forms of interaction, but while they are increasingly integrated into everyday contexts, their unique privacy and safety challenges remain understudied. This paper examines these challenges through a two-round interview study with $N = 15$ experts in human-computer interaction, and privacy and safety, using speculative scenarios and adversarial roleplaying to elicit insights. Our findings reveal risks specific to on-body interactions, including over-collection of sensitive data, unwanted inferences, harm to bystanders, and threats to bodily autonomy and psychological well-being. Importantly, in the on-body context, privacy and safety concerns are deeply interconnected and cannot be addressed in isolation. We contribute an empirically grounded characterization of these entangled challenges and derive eight actionable design guidelines to support safer, more privacy-aware, on-body systems. This work informs future research and design in ubiquitous computing by highlighting the need for proactive and integrated approaches to privacy and safety in trustworthy on-body computing.

著者
Dañiel Gerhardt
CISPA Helmholtz Center for Information Security, Saarbrücken, Germany
Divyanshu Bhardwaj
CISPA Helmholtz Center for Information Security, Saarbrücken, Germany
Ashwin Ram
Saarland Informatics Campus, Saarbrücken, Germany
André Zenner
Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Jürgen Steimle
Saarland University, Saarland Informatics Campus, Saarbrücken, Germany
Katharina Krombholz
CISPA − Helmholtz Center for Information Security, Saarbrücken, Germany
Mind the Gap: Mapping Wearer–Bystander Privacy Tensions and Context-Adaptive Pathways for Camera Glasses
要旨

Camera glasses create fundamental privacy tensions between wearers seeking recording functionality and bystanders concerned about unauthorized surveillance. We present a systematic multi-stakeholder evaluation of privacy mechanisms through surveys (N=525) and paired interviews (N=20) in China. Study 1 quantifies expectation-willingness gaps: bystanders consistently demand stronger information transparency and protective measures than wearers will provide, with disparities intensifying in sensitive contexts where 65–90\% of bystanders would take defensive action. Study 2 evaluates twelve privacy-enhancing technologies, revealing four fundamental trade-offs that undermine current approaches: visibility versus disruption, empowerment versus burden, protection versus agency, and accountability versus exposure. These gaps reflect structural incompatibilities rather than inadequate goodwill, with context emerging as the primary determinant of privacy acceptability. We propose context-adaptive pathways that dynamically adjust protection strategies: minimal-friction visibility in public spaces, structured negotiation in semi-public environments, and automatic protection in sensitive contexts. Our findings contribute a diagnostic framework for evaluating privacy mechanisms and implications for context-aware design in ubiquitous sensing.

著者
Xueyang Wang
Tsinghua University, Beijing, China
Kewen Peng
University of Utah, Salt Lake City, Utah, United States
Xin Yi
Tsinghua University, Beijing, China
Hewu Li
Tsinghua University, Beijing, China