195. Inferring Human State

前のセッションの直後

7

5分

Designing an Affective Mobile Probe to Measure Smile Dynamics in Depression

WristPP: A Wrist-Worn System for Hand Pose and Pressure Estimation

Mental Workload Prediction Using Physiological Signals: Balancing Performance and Interpretability

Signals of Success and Struggle: Early Prediction and Physiological Signatures of Human Performance across Task Complexity

AutoChainer: Automatic Data Augmentation for Stroke-based Input

Do You (Dis)agree With Me? Modelling Implicit User Disagreement in Human–AI Interaction Using Gaze Data

DraftMarks: Enhancing Transparency in Human-AI Co-Writing Through Interactive Skeuomorphic Process Traces

このボタンをクリックすると発表を担当できます。

なお、操作の前に「サインイン」する必要があります。

194. Human-in-the-Loop Machine Learning Interfaces

196. Learning in the AI Era

Depression is a complex disorder for which there is growing interest in identifying objective behavioral markers that measure precise symptoms, such as anhedonia and blunted emotional reactivity. This study explores the feasibility of using smile and smirk expression dynamics, captured through our novel stimulus-based mobile affective probe, as candidate digital biomarkers of depression severity within a large-scale mobile health intervention trial, BeWell. Data from 684 BeWell participants (2,702 observations) are analyzed longitudinally for 16 weeks, comparing their PHQ-8 survey scores with their facial responses to short videos intended to elicit smiles. Mixed-effects models reveal that higher maximum Duchenne smile intensity in reaction to liked stimuli is associated with lower depression scores over time at both within- and between-person levels. We additionally share insights from our tool, including ease of use, perceptions of the stimulus, and technical challenges, which offer considerations for the future development of stimulus-based affect probes in real-world settings.

読み込み中…

Accurate 3D hand pose and pressure sensing is essential for immersive human-computer interaction, yet simultaneously achieving both in mobile scenarios remains challenging. We present WristPP, a camera-based wrist-worn system that estimates 3D hand pose and per-vertex pressure from a single wide-FOV RGB frame in real time. A ViT (Vision Transformer) backbone with joint-aligned tokens predicts hand-vqvae codebook indices for mesh recovery, while an extrinsics-conditioned branch jointly estimates per-vertex pressure. On a self-collected dataset of 133,000 frames (20 subjects; 48 on-plane and 28 mid-air gestures), WristPP attains MPJPE (Mean Per-Joint Position Error) of 2.9mm, Contact IoU of 0.712, Vol.IoU of 0.618, and foreground pressure MAE of 10.4g. Across three user studies, WristPP delivers touchpad-level efficiency in mid-air pointing and robust multi-finger pressure control on an uninstrumented desktop. In a real-world large-display Whac-A-Mole task, WristPP also enables higher success ratio and lower arm fatigue than head-mounted camera-based baselines. These results position WristPP as an effective, mobile solution for versatile pose- and pressure-based interaction.

読み込み中…

Mental workload critically affects well-being and performance in safety-critical systems. While machine learning models for mental workload prediction often leverage physiological indicators, interpretability and error analysis are frequently overlooked. This study develops robust models for workload prediction that emphasize interpretability and analyzes common misclassifications to elucidate key mechanisms. Respiratory and cardiac signals from 30 participants, as well as oculomotor signals from 17 participants, captured under varying task demands were utilized. Five models of varying interpretability were validated with optimized hyperparameters and preprocessing. A logistic regression and a decision tree were selected to distinguish between two and three workload levels, respectively. On unseen test data, they achieved f1-scores of 90.5% (accuracy: 92.2%) and 72.0% (accuracy: 72.3%). Performance varied across scenarios and individuals. Findings show that transparent, efficient models combined with appropriate preprocessing can compete with black-box approaches, with implications for safety-critical applications where interpretability, trust, and computational efficiency are essential.

読み込み中…

User performance is crucial in interactive systems, capturing how effectively users engage with task execution. Prospectively predicting performance enables the timely identification of users struggling with task demands. While ocular and cardiac signals are widely used to characterise performance-relevant visual behaviour and physiological activation, their potential for early prediction and for revealing the physiological mechanisms underlying performance differences remains underexplored. We conducted a within-subject experiment in a game environment with naturally unfolding complexity, using early ocular and cardiac signals to predict later performance and to examine physiological and self-reported group differences. Results show that the ocular–cardiac fusion model achieves a balanced accuracy of 0.86, and the ocular-only model shows comparable predictive power. High performers exhibited targeted gaze and adjusted visual sampling, and sustained more stable cardiac activation as demands intensified, with a more positive affective experience. These findings demonstrate the feasibility of cross-session prediction from early physiology, providing interpretable insights into performance variation and facilitating future proactive intervention.

読み込み中…

Training Deep Learning classifiers for stroke-based applications requires collecting lots of samples, which is often expensive and time-consuming. Data augmentation (DA) techniques can mitigate this issue by artificially increasing the number of training samples, eventually improving model performance and robustness. Since the effectiveness of DA techniques mostly depends on the task and dataset, researchers have proposed automatic DA methods, mostly for computer vision tasks. Unfortunately stroke-based data remain underexplored. To address this research gap, we propose AutoChainer, an automatic DA technique suitable for stroke-based data, that consists of applying random chains of augmentation transformations. We perform classification tasks on a variety of datasets (including gestures, letters and signatures) and models, showing that AutoChainer achieves state-of-the-art results. It also has the potential to enhance the visual quality of augmented samples, making them more interpretable, and offers easy customization to task-specific requirements, such as balancing classification accuracy and execution time.

読み込み中…

The widespread use of generative AI has led to increased focus on human–AI interaction. However, AI systems can generate unexpected outputs, leading to disagreement or human–AI conflict. This paper focuses on modelling user disagreement using machine learning (ML) by observing users' implicit viewing behaviour. We conducted a controlled study with 30 participants evaluating captions from a simulated ML image-captioning system. Participants indicated agreement or disagreement with each caption while we recorded their gaze and facial-expression data, which we used to predict (dis)agreement. We show that unimodal gaze-based personalised modelling ($0.684$ average balanced accuracy) outperforms generalised modelling ($0.570$), whereas multimodal approaches did not improve performance. Our exploratory post hoc gaze-based analysis highlights the importance of feature selection and temporal dynamics, which help guide system design and future work. We release the dataset to support reproducibility and further work. Due to the nature of this research, we also discuss the potential ethical and privacy implications of continuous passive gaze and facial monitoring.

読み込み中…

As generative AI becomes part of everyday writing, questions of transparency and productive human effort are increasingly important. Educators, reviewers, and readers want to understand how AI shaped the process. Where was human effort focused? What role did AI play in the creation of the work? How did the interaction unfold? Existing approaches often reduce these dynamics to summary metrics or simplified provenance. We introduce DraftMarks, an augmented reading tool that supports readers in interpreting how text was constructed with AI through familiar physical metaphors. DraftMarks employs skeuomorphic encodings such as eraser crumbs to convey the intensity of revision, and masking tape or smudges to mark AI-generated content, simulating the process within the final written artifact. By using data from writer-AI interactions, DraftMarks’ algorithm computes various collaboration metrics and writing traces. Through a formative study, we identified computational logic for different readership, and evaluated DraftMarks through a Prolific study for its effectiveness in assessing AI co-authored writing.

読み込み中…

目次

セッション割り当て中

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ