Following along how-to videos requires alternating focus between understanding procedural video instructions and performing them. Examining how to support these continuous context switches for the user has been largely unexplored. In this paper, we describe a user study with thirty participants who performed an hour-long cooking task while interacting with a wizard-of-oz hands-free interactive system that is aware of both their cooking progress and environment contexts. Through analysis of the session scripts, we identify a dichotomy between participant query differences and workflow alignment similarities, under-studied interactions that require AI functionality beyond video navigation alone, and queries that call for multimodal sensing of a user’s environment. By understanding the assistant experience through the participants’ interactions, we identify design implications for a smart assistant that can discern a user’s task completion flow and personal characteristics, accommodate requests within and external to the task domain, and support nonvoice-based queries.
https://doi.org/10.1145/3544548.3581006
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2023.acm.org/)