Vocal training is difficult because the muscles that control pitch, resonance, and phonation are internal and invisible to learners. This paper investigates how Electromyography (EMG) and ultrasonic imaging (UI) can make these muscles observable for training purposes. We report three studies.
First, we analyze the EMG and UI data from 16 singers (beginners, experienced \& professionals), revealing differences among three vocal groups of the muscle control proficiency. Second, we use the collected data to create a system that visualizes an expert's muscle activity as reference. This system is tested in a user study with 12 novices, showing that EMG highlighted muscle activation nuances, while UI provided insights into vocal cord length and dynamics. Third, to compare our approach to traditional methods (audio analysis and coach instructions), we conducted a focus group study with 15 experienced singers. Our results suggest that EMG is promising for improving vocal skill development and enhancing feedback systems. We conclude the paper with a detailed comparison of the analyzed modalities (EMG, UI and traditional methods), resulting in recommendations to improve vocal muscle training systems.
Traditional pianos are inherently non-portable, restricting everyday accessibility and on-demand creativity. Existing portable alternatives, largely vision-based with external cameras, suffer from limited range, occlusion, and unreliable contact detection. We present PianoBand, a wrist-worn system integrating an IMU, a miniature under-wrist RGB camera, and a printed keyboard sheet augmented with fiducial markers for reliable key mapping on any flat surface. Powered by a lightweight real-time IMU–vision pipeline, PianoBand enables high-fidelity piano interaction, supporting single notes, multi-finger chords, flexible fingering, dynamic velocity, and preliminary articulation techniques. Technical evaluation showed robust tap detection (over 99% accuracy) and accurate fingertip localization (8.90 pixels error), enabling precise note mapping. A comparative user study (N=15) further evaluated system performance, reporting high note accuracy, comparable to roll-up pianos and outperforming an XR piano, along with high ratings for portability, expressivity, and extensibility. Expert interviews highlighted broad application opportunities for piano-based experience and music creation, suggesting future design directions.
Learning to play music is an embodied process, but traditional tools like scores and recordings overlook the role of gesture. While video tutorials offer visual cues, they remain detached from the instrument. We present ReTouche, an interactive system that projects synchronized notes and hand gestures directly onto the actuated keys of a player piano. The system includes a pipeline for adapting publicly available overhead-view YouTube videos and supports interactions for video-based learning, such as sectional practice, layered guidance, and self-recording. We evaluate ReTouche through a comparative structured observation study with YouTube-based self-learning (n=18), a two-week autoethnography study (n=3), and a focus group with professional piano teachers (n=4). Our findings show that embodied representations can ground self-guided piano learning by anchoring gesture, sound, and action within the instrument. Learners appropriated these representations to develop strategies and sustain motivation, while teachers saw potential for integrating ReTouche as a complement to conventional pedagogy.
Adolescence is marked by strong creative impulses but limited strategies for structured expression, often leading to frustration or disengagement. While generative AI lowers technical barriers and delivers efficient outputs, its role in fostering adolescents’ expressive growth has been overlooked. We propose MusicScaffold, an adolescent-centered framework that transforms classical AI roles from broad conceptualizations into stage-specific, actionable developmental scaffolds designed to make expressive strategies transparent and learnable and to support adolescents in mastering creative expression. In a four-week study with middle school students (ages 12–14), MusicScaffold enhanced cognitive specificity, behavioral regulation, and affective autonomy in music creation. By reframing generative AI as a scaffold rather than a generator, this work bridges the machine efficiency of generative systems with human growth in adolescent creativity education.
Music shapes the tone of videos, yet creators find it hard to find soundtracks that match their video's mood and narrative. Recent text-to-music models let creators generate music from text prompts, but our formative study (N=8) shows creators struggle to construct diverse prompts, quickly review and compare tracks, and understand their impact on the video. We present VidTune, a system that supports soundtrack creation by generating diverse music options from a creator’s prompt and producing contextual thumbnails for rapid review. VidTune extracts representative video subjects to ground thumbnails in context, maps each track’s valence and energy onto visual cues like color and brightness, and depicts prominent genres and instruments. Creators can refine tracks with natural language edits, which VidTune expands into new generations. In a controlled user study (N=12) and an exploratory case study (N=6), participants found VidTune helpful for efficiently reviewing and comparing music options and described the process as playful and enriching.
Live music provides a uniquely rich setting for studying creativity and interaction due to its spontaneous nature. The pursuit of live music agents---intelligent systems supporting real-time music performance and interaction---has captivated researchers across HCI, AI, and computer music for decades, and recent advancements in AI suggest unprecedented opportunities to evolve their design. However, the interdisciplinary nature of music has led to fragmented development across research communities, hindering effective communication and collaborative progress. In this work, we bring together perspectives from these diverse fields to map the current landscape of live music agents. Based on our analysis of 184 systems across both academic literature and video, we develop a comprehensive design space that categorizes dimensions spanning usage contexts, interactions, technologies, and ecosystems. By highlighting trends and gaps in live music agents, our design space offers researchers, designers, and musicians a structured lens to understand existing systems and shape future directions in real-time human-AI music co-creation. We release our annotated systems as a living artifact at https://live-music-agents.github.io.
Rhythm and articulation are essential for expressive guitar performance. Existing tools provide basic beat cues, whereas beginners often struggle to align with these cues when playing complex techniques, such as strumming and muting. Informed by a formative study with five instructors and grounded in embodied learning theories, we present FretFlow, a haptic vest-based tool that simulates common instructional practices to guide learners through physical interactions like tapping. The key to FretFlow is its design space that maps rhythmic and articulation patterns in various playing techniques to distinct haptic patterns, enabling authoring of haptic scores. FretFlow further dynamically adapts haptic intensity based on learners' real-time performance accuracy, accompanied by multimodal guidance across haptic, visual, and audio channels. We iteratively refined haptic designs across two rounds with 46 participants, followed by a two-week user study with 20 beginners. Results show that FretFlow improves learners’ rhythmic accuracy and expressive performance.