Agentic Design Review System

Abstract

Evaluating graphic designs involves assessing them from multiple facets like alignment, composition, aesthetics, and color choices. Evaluating designs in a holistic way involves aggregating feedback from individual expert reviewers. Towards this, we propose an Agentic Design Review System (Agentic-DRS), where multiple agents collaboratively analyze a design, orchestrated by a meta-agent.

A novel in-context exemplar selection approach based on graph matching and a unique prompt expansion method plays central role towards making each agent design aware. Towards evaluating this framework, we propose DRS-Bench benchmark.

Thorough experimental evaluation against state-of-the-art baselines, backed-up with critical ablation experiments brings out the efficacy of Agentic-DRS in evaluating graphic designs and generating actionable feedback.

Key Contributions

First Agentic Framework

Multi-agent system with static and dynamic agents for comprehensive design evaluation

Graph-based GRAD

Novel in-context exemplar selection using Wasserstein and Gromov-Wasserstein distances

Structured Design Description

Anchoring MLLM responses with visually grounded textual descriptions

DRS-Bench

Comprehensive benchmark with 4 datasets, 15 attributes, and novel evaluation metrics

Method Overview

Our framework consists of three key components that work together to enable comprehensive design evaluation:

GRAD: Graph-based Design Exemplar Selection

Unlike conventional approaches that rely on global CLIP features, GRAD constructs graph representations that encode semantic, spatial, and structural relationships between design elements.

Node Matching: Wasserstein Distance for semantic feature alignment
Edge Matching: Gromov-Wasserstein Distance for structural topology preservation
Result: Structure-aware retrieval of in-context examples

SDD: Structured Design Description

Generate textual descriptions containing element descriptions and hierarchical structure with optional bounding box coordinates.

Combines visual perception with explicit structural details
Improves design attribute understanding and anomaly detection
Reduces hallucinations and enables actionable feedback

Agentic Design Review System

A multi-agent framework inspired by conference peer-review with three phases:

Planning

Meta-agent assigns static and dynamic reviewers based on design context

Reviewing

Agents evaluate design attributes and generate individual feedback

Summarization

Meta-agent consolidates scores and produces unified actionable feedback

Demo

Watch how Agentic-DRS analyzes a design step-by-step through its multi-agent review process.

Input Design

Auto-playing demo

Agentic-DRS Feedback

Starting...

Results

                    +12.4%
                    Accuracy Improvement
                    Over GPT-4o baseline on Afixa
                

                    76.8%
                    Best Accuracy
                    On Internal Design Dataset
                

                    0.834
                    Overlap Correlation
                    With human ratings on GDE
                

Performance on DRS-Bench

Method	Afixa			Infographic			GDE (Correlation)
Method	Acc ↑	Sens ↑	Spec ↑	Acc ↑	Sens ↑	Spec ↑	Align ↑	Overlap ↑	Whitespace ↑
GPT-4o	62.91	65.42	64.26	58.26	61.92	56.74	0.597	0.782	0.665
GPT-4o + GRAD	64.57	68.65	65.18	60.41	63.57	59.66	0.639	0.796	0.688
GPT-4o + GRAD + SDD	67.33	69.60	68.21	64.95	66.21	62.12	0.677	0.809	0.703
Agentic-DRS (GPT-4o)	75.29	77.65	72.53	69.53	75.37	71.94	0.722	0.834	0.748

Qualitative Example

Agentic-DRS generates detailed, actionable feedback covering multiple design attributes including color harmony, spacing, typography, grouping, alignment, composition, and style. The feedback quality is validated by both automated metrics (AIM) and human ratings.

DRS-Bench: Design Review System Benchmark

A comprehensive benchmark for measuring design effectiveness, enabling fair comparisons and fostering improvements in automated design tools.

4 Datasets

GDE: 700 designs with alignment, overlap, whitespace ratings (1-10 scale)
Afixa: 71 designs with 5 binary attributes
Infographic: 55 designs with layout metadata and 15 attributes
IDD: 137 professionally curated designs with 15 attributes

15 Design Attributes

Text-rendering Typography Color harmony Alignment Spacing Composition Overlap Grouping Style Aesthetics Image-text alignment Color palette Bad images Font variety Word count

Evaluation Metrics

Discrete Evaluation

Accuracy, Sensitivity, Specificity for multi-label classification

Continuous Evaluation

Pearson correlation with human ratings

Feedback Evaluation (AIM)

Actionable Insights Metric via LLM and semantic similarity

Abstract

Key Contributions

First Agentic Framework

Graph-based GRAD

Structured Design Description

DRS-Bench

Method Overview

GRAD: Graph-based Design Exemplar Selection

SDD: Structured Design Description

Agentic Design Review System

Demo

Results

Performance on DRS-Bench

Qualitative Example

DRS-Bench: Design Review System Benchmark

4 Datasets

15 Design Attributes

Evaluation Metrics

Discrete Evaluation

Continuous Evaluation

Feedback Evaluation (AIM)

Citation