Child sexual abuse material (CSAM) presents a critical challenge for online safety, yet the verification procedures that determine which items are classified as CSAM remain poorly understood. Triple verification (requiring three reviewers to agree) is promoted as a safeguard, but little is known about how it is implemented, how it is perceived by experts, and how voting conditions affect reliability. We address this gap through a mixed-methods study. We interviewed 14 experts from seven organizations (e.g., law enforcement, hotlines, etc.) to map current verification practices, then ran an inter-reliability experiment with Dutch National Police experts who reviewed 2,031 images and videos under different voting conditions (blind vs. non-blind, varied order). Finally, we held a focus group to explore the reasons behind disagreements. We find that practices vary widely, perceptions of triple verification reflect both safeguards and burdens, and expert agreement depends on voting conditions and content type.
ACM CHI Conference on Human Factors in Computing Systems