AesExpert

Abstract

The highly abstract nature of image aesthetics perception (IAP) poses significant challenge for current multimodal large language models (MLLMs). The lack of human-annotated multi-modality aesthetic data further exacerbates this dilemma, resulting in MLLMs falling short of aesthetics perception capabilities. To address the above challenge, we first introduce a comprehensively annotated Aesthetic Multi-Modality Instruction Tuning (AesMMIT) dataset, which serves as the footstone for building multi-modality aesthetics foundation models. Specifically, to align MLLMs with human aesthetics perception, we construct a corpus-rich aesthetic critique database with 21,904 diverse-sourced images and 88K human natural language feedbacks, which are collected via progressive questions, ranging from coarse-grained aesthetic grades to fine-grained aesthetic descriptions. To ensure that MLLMs can handle diverse queries, we further prompt GPT to refine the aesthetic critiques and assemble the large-scale aesthetic instruction tuning dataset, i.e. AesMMIT, which consists of 409K multi-typed instructions to activate stronger aesthetic capabilities. Based on the AesMMIT database, we fine-tune the open-sourced general foundation models, achieving multi-modality Aesthetic Expert models, dubbed AesExpert. Extensive experiments demonstrate that the proposed AesExpert models deliver significantly better aesthetic perception performances than the state-of-the-art MLLMs, including the most advanced GPT-4V and Gemini-Pro-Vision.

Pipeline

Experiment

Demo

Aesthetic description

Aesthetic interpretation

Enhancement suggestion

Composition and emotion

BibTeX

@article{AesExpert,
  title={AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception},
  author={Yipo Huang and Xiangfei Sheng and Zhichao Yang and Quan Yuan and Zhichao Duan and Pengfei Chen and Leida Li and Weisi Lin and Guangming Shi},
  journal={arXiv:2404.09624},
  year={2024}
}

AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception

ACMMM 2024