Characterizing Practices, Limitations, and Opportunities Related to Text Information Extraction Workflows: A Human-in-the-loop Perspective

要旨

Information extraction (IE) approaches often play a pivotal role in text analysis and require significant human intervention. Therefore, a deeper understanding of existing IE practices and related challenges from a human-in-the-loop perspective is warranted. In this work, we conducted semi-structured interviews in an industrial environment and analyzed the reported IE approaches and limitations. We observed that data science workers often follow an iterative task model consisting of information foraging and sensemaking loops across all the phases of an IE workflow. The task model is generalizable and captures diverse goals across these phases (e.g. data preparation, modeling, evaluation.) We found several limitations in both foraging (e.g., data exploration) and sensemaking (e.g., qualitative debugging) loops stemming from a lack of adherence to existing cognitive engineering principles. Moreover, we identified that due to the iterative nature of an IE workflow, the requirement of provenance is often implied but rarely supported by existing systems. Based on these findings, we discuss design implications for supporting IE workflows and future research directions.

著者
Sajjadur Rahman
Megagon Labs, Mountain View, California, United States
Eser Kandogan
Megagon Labs, Mountain View, California, United States
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3502068

動画

会議: CHI 2022

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2022.acm.org/)

セッション: Reasoning and Sensemaking

393
5 件の発表
2022-05-05 18:00:00
2022-05-05 19:15:00