AAAI 2026

icon Agentic Design Review System

The First Multi-Agent Framework for Holistic Graphic Design Evaluation and Actionable Feedback Generation

Adobe Research
Agentic Design Review System Overview

Inspired by conference peer-review: Just as expert reviewers evaluate papers across multiple dimensions, our Agentic-DRS employs specialized AI agents to analyze graphic designs from multiple perspectives—typography, color harmony, alignment, spacing, and more—orchestrated by a meta-agent to generate holistic scores and actionable feedback.

Abstract

Evaluating graphic designs involves assessing them from multiple facets like alignment, composition, aesthetics, and color choices. Evaluating designs in a holistic way involves aggregating feedback from individual expert reviewers. Towards this, we propose an Agentic Design Review System (Agentic-DRS), where multiple agents collaboratively analyze a design, orchestrated by a meta-agent.

A novel in-context exemplar selection approach based on graph matching and a unique prompt expansion method plays central role towards making each agent design aware. Towards evaluating this framework, we propose DRS-Bench benchmark.

Thorough experimental evaluation against state-of-the-art baselines, backed-up with critical ablation experiments brings out the efficacy of Agentic-DRS in evaluating graphic designs and generating actionable feedback.

Key Contributions

First Agentic Framework

Multi-agent system with static and dynamic agents for comprehensive design evaluation

Graph-based GRAD

Novel in-context exemplar selection using Wasserstein and Gromov-Wasserstein distances

Structured Design Description

Anchoring MLLM responses with visually grounded textual descriptions

DRS-Bench

Comprehensive benchmark with 4 datasets, 15 attributes, and novel evaluation metrics

Method Overview

Agentic-DRS Pipeline

Our framework consists of three key components that work together to enable comprehensive design evaluation:

01

GRAD: Graph-based Design Exemplar Selection

Unlike conventional approaches that rely on global CLIP features, GRAD constructs graph representations that encode semantic, spatial, and structural relationships between design elements.

  • Node Matching: Wasserstein Distance for semantic feature alignment
  • Edge Matching: Gromov-Wasserstein Distance for structural topology preservation
  • Result: Structure-aware retrieval of in-context examples
02

SDD: Structured Design Description

Generate textual descriptions containing element descriptions and hierarchical structure with optional bounding box coordinates.

  • Combines visual perception with explicit structural details
  • Improves design attribute understanding and anomaly detection
  • Reduces hallucinations and enables actionable feedback
03

Agentic Design Review System

A multi-agent framework inspired by conference peer-review with three phases:

Planning

Meta-agent assigns static and dynamic reviewers based on design context

Reviewing

Agents evaluate design attributes and generate individual feedback

Summarization

Meta-agent consolidates scores and produces unified actionable feedback

Demo

Watch how Agentic-DRS analyzes a design step-by-step through its multi-agent review process.

Input Design
Futurology Infographics
Auto-playing demo
Agentic-DRS Feedback
Starting...

Results

+12.4% Accuracy Improvement Over GPT-4o baseline on Afixa
76.8% Best Accuracy On Internal Design Dataset
0.834 Overlap Correlation With human ratings on GDE

Performance on DRS-Bench

Method Afixa Infographic GDE (Correlation)
Acc ↑ Sens ↑ Spec ↑ Acc ↑ Sens ↑ Spec ↑ Align ↑ Overlap ↑ Whitespace ↑
GPT-4o 62.91 65.42 64.26 58.26 61.92 56.74 0.597 0.782 0.665
GPT-4o + GRAD 64.57 68.65 65.18 60.41 63.57 59.66 0.639 0.796 0.688
GPT-4o + GRAD + SDD 67.33 69.60 68.21 64.95 66.21 62.12 0.677 0.809 0.703
Agentic-DRS (GPT-4o) 75.29 77.65 72.53 69.53 75.37 71.94 0.722 0.834 0.748

Qualitative Example

Qualitative Results

Agentic-DRS generates detailed, actionable feedback covering multiple design attributes including color harmony, spacing, typography, grouping, alignment, composition, and style. The feedback quality is validated by both automated metrics (AIM) and human ratings.

DRS-Bench: Design Review System Benchmark

A comprehensive benchmark for measuring design effectiveness, enabling fair comparisons and fostering improvements in automated design tools.

4 Datasets

  • GDE: 700 designs with alignment, overlap, whitespace ratings (1-10 scale)
  • Afixa: 71 designs with 5 binary attributes
  • Infographic: 55 designs with layout metadata and 15 attributes
  • IDD: 137 professionally curated designs with 15 attributes

15 Design Attributes

Text-rendering Typography Color harmony Alignment Spacing Composition Overlap Grouping Style Aesthetics Image-text alignment Color palette Bad images Font variety Word count

Evaluation Metrics

Discrete Evaluation

Accuracy, Sensitivity, Specificity for multi-label classification

Continuous Evaluation

Pearson correlation with human ratings

Feedback Evaluation (AIM)

Actionable Insights Metric via LLM and semantic similarity

Citation

@inproceedings{nag2026agentic,
    title={Agentic Design Review System},
    author={Nag, Sayan and Joseph, K J and Goswami, Koustava 
            and Morariu, Vlad I and Srinivasan, Balaji Vasan},
    booktitle={Proceedings of the AAAI Conference on 
               Artificial Intelligence},
    year={2026}
}