Short video streaming systems such as TikTok have reached billions of active users worldwide. At the core of such systems are (proprietary) algorithms that recommend sequences of videos to each user, in a personalized way. We aim to understand the interplay between the recommendations and users. While past work has studied recommendation algorithms using textual data (e.g., hashtags) and user studies, we add a third modality of analysis—we perform automated analysis of the videos themselves. We develop a new HCI measurement approach that starts with our new tool called VCA (Video Content Analysis) that leverages recent advances in Vision Language Models. We apply VCA on a trifecta of HCI methodologies—real user studies, interviews, and data donation. This allows us to understand temporal aspects of how well TikTok’s recommendation algorithm is perceived by users, is affected by user interactions, and aligns with user history; how users are sensitive to the order of videos recommended; and how the algorithm’s effectiveness itself may be predictable in the future. Our new findings indicate behavioral aspects that the TikTok user community can benefit from.
ACM CHI Conference on Human Factors in Computing Systems