Making Multimodal LLMs Reliable Chart Data Extractors: A Benchmark and Training Framework

要旨

Chart data extraction, which reverse-engineers data tables from chart images, is essential for reproducibility, analysis, retrieval, and redesign. Existing interactive tools are reliable but tedious, and mixed-initiative systems, while more efficient, lack generalizability. Recent multimodal large language models (MLLMs) offer a unified interface for chart interpretation, yet their ability to extract accurate data tables, especially without visible labels, remains unclear. We build a benchmark featuring diverse real-world charts without data labels to evaluate this capability. Results show that, while current MLLMs reliably reconstruct table structures, they struggle with precise value recovery. To address this, we revisit chart data extraction from a human-centered perspective and argue that extraction should follow a progressive learning process similar to how people read charts. Our training framework substantially improves numerical accuracy, achieving state-of-the-art performance with a 7B-parameter model. A user study further shows that our model effectively supports mixed-initiative workflows for reliable chart data extraction.

著者
Yuchen He
Zhejiang University, Hangzhou, China
Peizhi Ying
Zhejiang University, Hangzhou, China
Liqi Cheng
Zhejiang University, Hangzhou, China
Kuilin Peng
Guangdong University of Technology, GuangZhou, China
Yuan Tian
Zhejiang University, Hangzhou, China
Dazhen Deng
Zhejiang University, Ningbo, Zhejiang, China
Yingcai Wu
Zhejiang University, Hangzhou, Zhejiang, China

会議: CHI 2026

ACM CHI Conference on Human Factors in Computing Systems

セッション: AI & Data Visualization

M2 - Room M211/212
6 件の発表
2026-04-15 18:00:00
2026-04-15 19:30:00