GEPA: Prompt Optimization Beyond Blind Reward

The Problem

Blind Optimization

Current methods run the LLM thousands of times, score each attempt with a single number, and nudge the prompt toward higher scores. All the rich detail of what happened — errors, reasoning, partial progress — gets collapsed into one scalar. The optimizer never sees why something failed.

error_msg

trace_log

reasoning

tool_output

partial_result

▼

r = 0.34

5,000–25,000LLM calls needed

ScalarSignal type

✗ Blind to failure modes ✗ Massively wasteful

The Insight

Read the Traces

Instead of a single score, keep the full execution log — errors, reasoning chains, tool outputs — and feed it back as natural-language feedback. Think of it like "text gradients": structured hints that tell the optimizer which direction to push the prompt.

RL Sees

0.34

GEPA Sees

step 3: tool timeout

cause: malformed query

trace: parse → fail @ arg2

fix: add type check

reflect: prompt too vague

Full tracesSignal quality

Text gradientsFeedback type

✓ Sees why things fail ✓ Actionable feedback

Evolution Loop

Genetic-Pareto Search

Two models divide the work: a cheap, fast model ("Student") runs tasks, while a stronger model ("Reflection") reads the traces and rewrites the prompt. Instead of converging on one best prompt, GEPA keeps a Pareto frontier — a set of diverse, non-dominated candidates that trade off different quality dimensions.

Pareto Frontier

best-so-far prompts

↓

Student LM: Execute

fast & cheap, 95%

↓

Reflection LM: Analyze

strong & rare, 5%

↓

Mutate Prompt

↓

Update Frontier

↓

↺ repeat

95% / 5%Student-to-Reflection ratio

~150LLM calls to converge

✓ Cheap execution ✓ Smart reflection ✓ Diverse solutions

Results

35x More Efficient

GEPA matches or exceeds all baselines while using a fraction of the LLM calls.

GRPORL-based

25,000

MIPROv2search-based

~15,000

GEPAthis paper

~700

35xFewer LLM calls

+13%Over MIPROv2

9.2xShorter prompts

✓ ICLR 2026 Oral ✓ State of the art

Paper

GEPA: Genetic-Pareto Prompt Optimization

ICLR 2026 Oral

Replaces RL's scalar rewards with full execution trace reflection, achieving 35x efficiency gains and +13% over SOTA with 9.2x shorter prompts.