Can Multimodal Large Language Models (MLLMs), with capabilities in perception, recognition, understanding, and reasoning, act as independent assistants in art evaluation dialogues? Current MLLM evaluation methods, reliant on subjective human scoring or costly interviews, lack comprehensive scenario coverage. This paper proposes a process-oriented Human-Computer Interaction (HCI) space design for more accurate MLLM assessment and development. This approach aids teachers in efficient art evaluation and records interactions for MLLM capability assessment. We introduce ArtMentor, a comprehensive space integrating a dataset and three systems for optimized MLLM evaluation. It includes 380 sessions from five art teachers across nine critical dimensions. The modular system features entity recognition, review generation, and suggestion generation agents, enabling iterative upgrades. Machine learning and natural language processing ensure reliable evaluations. Results confirm GPT-4o’s effectiveness in assisting teachers in art evaluation dialogues. Our contributions are available at https://artmentor.github.io/.
https://dl.acm.org/doi/10.1145/3706598.3713274
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)