As video has become the dominant mode of content on platforms such as YouTube, TikTok, and Instagram, captioning has emerged as a critical factor for accessibility, engagement, and visibility. While prior studies have examined different types of social media video captions or communities' captioning usage, a systematic synthesis has not been undertaken, leading to the risk of proposing interventions that overlook core platform constraints or miss critical accessibility needs. This paper reviews 36 peer-reviewed papers published between 2015 and 2025 across fields such as Human-Computer Interaction (HCI), accessibility, media studies, education, and language learning. We note that captions operate as collective infrastructure co-produced by viewers, creators, and platforms. Deaf and Hard of Hearing (DHH), neurodivergent, and multilingual viewers depend on captions and increasingly expect mechanisms for feedback, while creators face inadequate tool support. Building on these insights, we propose the framework of Participatory Captioning and suggest design implications, highlighting future directions for social media video caption research.
ACM CHI Conference on Human Factors in Computing Systems