Music is intrinsically connected to human experience, yet the plethora of choices often renders the search for the ideal piece perplexing, especially when the search terms are ambiguous. This study questions the viability of employing visual data, specifically images, in innovative queries for music search, and it aims to better align search results with users' moods and situational context. We designed and evaluated three prototype systems for music search—TTTune (text-based), VisTune (image-based), and VTTune (hybrid)—to comparatively assess user experience and system usability. In a comprehensive user study involving 236 participants, each participant interacted with one of the systems and subsequently completed post-experimental surveys. A subset of participants also participated in in-depth interviews to further elucidate the potential and the advantages of image-based music retrieval (IMR) systems. Our findings reveal a marked preference for the user experience and usability offered by the IMR approach, as compared with the traditional text-based method. This underscores the potential of the image in an effective search query. Based on these findings, we discuss interface design guidelines tailored for IMR systems and factors affecting system performance, contributing to the evolving landscape of music search methods.
https://doi.org/10.1145/3613904.3642126
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2024.acm.org/)