Quantitative vulnerability assessment is central to security management, guiding how risks are prioritized and mitigated. Yet, severity scoring relies on human judgment and is therefore subject to differences in experience, interpretation, and diligence; prior work has even shown expert disagreement. We examine an NLP-based assistive tool that visualizes keyword cues during assessment. In a controlled survey of 389 participants recruited via Amazon MTurk and Prolific, we statistically analyze how participant skills/demographics, vulnerability characteristics, and tool support affect outcomes. Results show the tool does not consistently improve assessment accuracy across expertise levels, but can help for specific vulnerability types (e.g., CWE-787) and CVSS metrics (AC, PR, Scope), and can increase user confidence. Beyond immediate performance, the tool can support training for manual assessment tasks that are hard to automate, as learning effects yield significant improvements on subsequent tasks. This work informs the design of cybersecurity decision-support tools and motivates future research on security training and human-centered security.
ACM CHI Conference on Human Factors in Computing Systems