We present the design and creation of a disability-first dataset, “BIV-Priv,” which contains 728 images and 728 videos of 14 private categories captured by 26 blind participants to support downstream development of artificial intelligence (AI) models. While best practices in dataset creation typically attempt to eliminate private content, some applications require such content for model development. We describe our approach in creating this dataset with private content in an ethical way, including using props rather than participants’ own private objects and balancing multi-disciplinary perspectives (e.g., accessibility, privacy, computer vision) to meet the tangible metrics (e.g., diversity, category, amount of content) to support AI innovations. We observed challenges that our participants encountered during the data collection, including accessibility issues (e.g., understanding foreground vs. background object placement) and issues due to the sensitive nature of the content (e.g., discomfort in capturing some props such as condoms around family members).
https://doi.org/10.1145/3544548.3580922
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2023.acm.org/)