Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation

Recently, red teaming, with roots in security, has become a key evaluative approach to ensure the safety and reliability of Generative Artificial Intelligence. However, most existing work emphasizes technical benchmarks and attack success rates, leaving the socio-technical practices of how red teaming datasets are defined, created, and evaluated under-examined. Drawing on 22 interviews with practitioners who design and evaluate red teaming datasets, we examine the data practices and standards that underpin this work. Because adversarial datasets determine the scope and accuracy of model evaluations, they are critical artifacts for assessing potential harms from large language models. Our contributions are first, empirical evidence of practitioners conceptualizing red teaming and developing and evaluating red teaming datasets. Second, we reflect on how practitioners’ conceptualization of risk leads to overlooking the context, interaction type, and user specificity. We conclude with three opportunities for HCI researchers to expand the conceptualization and data practices for red-teaming.

IBM Research, Yorktown Heights, New York, United States

Pennsylvania State University, University Park, Pennsylvania, United States

University of Notre Dame, Notre Dame, Indiana, United States

University of Notre Dame, South Bend, Indiana, United States

ACM CHI Conference on Human Factors in Computing Systems

P1 - Room 114

7 件の発表

開始日時2026-04-14 20:15:00

終了日時2026-04-14 21:45:00

お気に入り

あとで読む

コレクション

Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation

要旨

受賞
Honorable Mention

著者

動画

会議: CHI 2026

セッション: Consent, Risk, and Everyday Ethics

Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation

要旨

受賞Honorable Mention

著者

動画

会議: CHI 2026

セッション: Consent, Risk, and Everyday Ethics

受賞
Honorable Mention