2. AI Explanations and Decision Support in Healthcare

Metacognitive Demands and Strategies While Using Off-The-Shelf AI Conversational Agents for Health Information Seeking
説明

As Artificial Intelligence (AI) conversational agents become widespread, people are increasingly using them for health information seeking. The use of off-the-shelf conversational agents for health information seeking could place high metacognitive demands (the need for extensive monitoring and control of one's own thought process) on individuals, which could compromise their experience of seeking health information. However, currently, the specific demands that arise while using conversational agents for health information seeking, and the strategies people use to cope with those demands, remain unknown. To address these gaps, we conducted a think-aloud study with 15 participants as they sought health information using our off-the-shelf AI conversational agent. We identified the metacognitive demands such systems impose, the strategies people adopt in response, and propose considerations for designing beyond off-the-shelf interfaces to reduce these demands and support better user experiences and affordances in health information seeking.

日本語まとめ
読み込み中…
読み込み中…
Intelligent Reasoning Cues: A Framework and Case Study of the Roles of AI Information in Complex Decisions
説明

Artificial intelligence (AI)-based decision support systems can be highly accurate yet still fail to support users or improve decisions. Existing theories of AI-assisted decision-making focus on calibrating reliance on AI advice, leaving it unclear how different system designs might influence the reasoning processes underneath. We address this gap by reconsidering AI interfaces as collections of intelligent reasoning cues: discrete pieces of AI information that can individually influence decision-making. We then explore the roles of eight types of reasoning cues in a high-stakes clinical decision (treating patients with sepsis in intensive care). Through contextual inquiries with six teams and a think-aloud study with 25 physicians, we find that reasoning cues have distinct patterns of influence that can directly inform design. Our results also suggest that reasoning cues should prioritize tasks with high variability and discretion, adapt to ensure compatibility with evolving decision needs, and provide complementary, rigorous insights on complex cases.

日本語まとめ
読み込み中…
読み込み中…
Design and Multi-level Evaluation of MAP-X: a Medically Aligned, Patient-Centered AI Explanation System
説明

Health artificial intelligence (AI) is often developed in high-stakes, data-scarce contexts, where both clinical validity and patient comprehension are critical; however, rigorous, multi-level evaluation of explanations in real-world patient-facing settings remains challenging. To enhance patient understanding and trust, we propose a practical blueprint for designing and evaluating medically aligned, patient-centered explanation (MAP-X). We propose this blueprint through MAP-X, a system that employs a large language model (LLM) with retrieval-augmented generation (RAG) to translate clinical assessments into an understandable interface. We conducted a three-phase evaluation following a multi-level validation framework: a functional evaluation of faithfulness, a clinician evaluation of workflow suitability, and a patient evaluation of perceived understanding and trust. Our findings suggest that MAP-X may support clinical adoption. In the patient study, MAP-X showed higher reported trust and a positive trend in explanation satisfaction. Interviews suggested clearer understanding of assessment results. Overall, MAP-X produced clinically relevant explanations with reasonable faithfulness and usability. Clinician oversight remains necessary.

日本語まとめ
読み込み中…
読み込み中…
MIND: Empowering Mental Health Clinicians with Multimodal Data Insights through a Narrative Dashboard
説明

Advances in data collection enable the capture of rich patient-generated data: from passive sensing (e.g., wearables and smartphones) to active self-reports (e.g., cross-sectional surveys and ecological momentary assessments). Although prior research has demonstrated the utility of patient-generated data in mental healthcare, significant challenges remain in effectively presenting these data streams along with clinical data (e.g., clinical notes) for clinical decision-making. Through co-design sessions with five clinicians, we propose MIND, a large language model-powered dashboard designed to present clinically relevant multimodal data insights for mental healthcare. MIND presents multimodal insights through narrative text, complemented by charts communicating underlying data. Our user study (N=16) demonstrates that clinicians perceive MIND as a significant improvement over baseline methods, reporting improved performance to reveal hidden and clinically relevant data insights (p<.001) and support their decision-making (p=.004). Grounded in the study results, we discuss future research opportunities to integrate data narratives in broader clinical practices.

日本語まとめ
読み込み中…
読み込み中…
Augmenting Clinical Decision-Making with an Interactive and Interpretable AI Copilot: A Real-World User Study with Clinicians in Nephrology and Obstetrics
説明

Clinician skepticism toward opaque AI hinders adoption in high-stakes healthcare. We present AICare, an interactive and interpretable AI copilot for collaborative clinical decision-making. By analyzing longitudinal electronic health records, AICare grounds dynamic risk predictions in scrutable visualizations and LLM-driven diagnostic recommendations. Through a within-subjects counterbalanced study with 16 clinicians across nephrology and obstetrics, we comprehensively evaluated AICare using objective measures (task completion time and error rate), subjective assessments (NASA-TLX, SUS, and confidence ratings), and semi-structured interviews. Our findings indicate AICare's reduced cognitive workload. Beyond performance metrics, qualitative analysis reveals that trust is actively constructed through verification, with interaction strategies diverging by expertise: junior clinicians used the system as cognitive scaffolding to structure their analysis, while experts engaged in adversarial verification to challenge the AI's logic. This work offers design implications for creating AI systems that function as transparent partners, accommodating diverse reasoning styles to augment rather than replace clinical judgment.

日本語まとめ
読み込み中…
読み込み中…
Building Benchmarks from the Ground Up: Community-Centered Evaluation of LLMs in Healthcare Chatbot Settings
説明

Large Language Models (LLMs) are typically evaluated through general or domain-specific benchmarks testing capabilities that often lack grounding in the lived realities of end users. Critical domains such as healthcare require evaluations that extend beyond artificial or simulated tasks to reflect the everyday needs, cultural practices, and nuanced contexts of communities. We propose Samiksha, a community-driven evaluation pipeline co-created with civil-society organizations (CSOs) and community members. Our approach enables scalable, automated benchmarking through a culturally aware, community-driven pipeline in which community feedback informs what to evaluate, how the benchmark is built, and how outputs are scored. We demonstrate this approach in the health domain in India. Our analysis highlights how current multilingual LLMs address nuanced community health queries, while also offering a scalable pathway for contextually grounded and inclusive LLM evaluation.

日本語まとめ
読み込み中…
読み込み中…
Constructing Everyday Well-Being: Insights from God-Saeng (God生) for Personal Informatics
説明

While Personal Informatics (PI) systems support behavior change, everyday well-being involves more than achieving individual target behaviors. It is shaped by cultural narratives that give actions meaning. In South Korea, the God-Saeng (God生) phenomenon—encompassing disciplined, collective, and publicly documented self-improvement practices—offers a lens into how well-being is negotiated in daily life. We conducted a 10-day probe (N=24) with bite-sized missions to examine how young adults engaged in God-Saeng. Participants relied on planning practices, accountability infrastructures, and datafication to stabilize themselves, yet these same routines also intensified pressures toward self-monitoring and performance. They navigated tensions between consistency and flexibility, authenticity and visibility, and productivity and broader values such as relationships, and reinterpreted ordinary activities through sociocultural contexts.

These insights suggest design opportunities for PI systems that move beyond tracking, toward digital instruments that help users negotiate tensions, make meaning, and reflexively understand how technologies participate in their culturally and existentially situated well-being.

日本語まとめ
読み込み中…
読み込み中…