Ensuring timely takeover in conditionally autonomous vehicles presents a significant challenge, especially when drivers are distracted by non-driving-related tasks or are in suboptimal emotional states. Existing driver monitoring systems struggle with a trade-off between practicality and reliability. Physiological sensors are intrusive, vision-based methods are sensitive to occlusions and variable lighting, and current multimodal learning approaches often rely on simple fusion strategies that fail to reconcile heterogeneous data. We introduce MUST (Multimodal Unified Smartwatch-based Takeover), a framework that predicts driver state and takeover performance using unobtrusive smartwatch signals. MUST employs an asymmetric causal fusion mechanism to model the interplay between driver behavior and emotion. The performance of the architecture was validated in diverse simulator environments reflecting real-world driving conditions, demonstrating robust driver state estimation and takeover prediction. This work establishes the smartwatch as a practical tool for adaptive takeover support, enabling reliable readiness assessment without intrusive hardware or fragile vision systems.
ACM CHI Conference on Human Factors in Computing Systems