The First Multi-Agent Framework for Holistic Graphic Design Evaluation and Actionable Feedback Generation
Evaluating graphic designs involves assessing them from multiple facets like alignment, composition, aesthetics, and color choices. Evaluating designs in a holistic way involves aggregating feedback from individual expert reviewers. Towards this, we propose an Agentic Design Review System (Agentic-DRS), where multiple agents collaboratively analyze a design, orchestrated by a meta-agent.
A novel in-context exemplar selection approach based on graph matching and a unique prompt expansion method plays central role towards making each agent design aware. Towards evaluating this framework, we propose DRS-Bench benchmark.
Thorough experimental evaluation against state-of-the-art baselines, backed-up with critical ablation experiments brings out the efficacy of Agentic-DRS in evaluating graphic designs and generating actionable feedback.
Multi-agent system with static and dynamic agents for comprehensive design evaluation
Novel in-context exemplar selection using Wasserstein and Gromov-Wasserstein distances
Anchoring MLLM responses with visually grounded textual descriptions
Comprehensive benchmark with 4 datasets, 15 attributes, and novel evaluation metrics
Our framework consists of three key components that work together to enable comprehensive design evaluation:
Unlike conventional approaches that rely on global CLIP features, GRAD constructs graph representations that encode semantic, spatial, and structural relationships between design elements.
Generate textual descriptions containing element descriptions and hierarchical structure with optional bounding box coordinates.
A multi-agent framework inspired by conference peer-review with three phases:
Meta-agent assigns static and dynamic reviewers based on design context
Agents evaluate design attributes and generate individual feedback
Meta-agent consolidates scores and produces unified actionable feedback
Watch how Agentic-DRS analyzes a design step-by-step through its multi-agent review process.
| Method | Afixa | Infographic | GDE (Correlation) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Acc ↑ | Sens ↑ | Spec ↑ | Acc ↑ | Sens ↑ | Spec ↑ | Align ↑ | Overlap ↑ | Whitespace ↑ | |
| GPT-4o | 62.91 | 65.42 | 64.26 | 58.26 | 61.92 | 56.74 | 0.597 | 0.782 | 0.665 |
| GPT-4o + GRAD | 64.57 | 68.65 | 65.18 | 60.41 | 63.57 | 59.66 | 0.639 | 0.796 | 0.688 |
| GPT-4o + GRAD + SDD | 67.33 | 69.60 | 68.21 | 64.95 | 66.21 | 62.12 | 0.677 | 0.809 | 0.703 |
| Agentic-DRS (GPT-4o) | 75.29 | 77.65 | 72.53 | 69.53 | 75.37 | 71.94 | 0.722 | 0.834 | 0.748 |
Agentic-DRS generates detailed, actionable feedback covering multiple design attributes including color harmony, spacing, typography, grouping, alignment, composition, and style. The feedback quality is validated by both automated metrics (AIM) and human ratings.
A comprehensive benchmark for measuring design effectiveness, enabling fair comparisons and fostering improvements in automated design tools.
Accuracy, Sensitivity, Specificity for multi-label classification
Pearson correlation with human ratings
Actionable Insights Metric via LLM and semantic similarity
@inproceedings{nag2026agentic,
title={Agentic Design Review System},
author={Nag, Sayan and Joseph, K J and Goswami, Koustava
and Morariu, Vlad I and Srinivasan, Balaji Vasan},
booktitle={Proceedings of the AAAI Conference on
Artificial Intelligence},
year={2026}
}