← Visual Notes

GEPA: Prompt Optimization Beyond Blind Reward

LLMs are sensitive to how you prompt them. Prompt optimization automates the search for better prompts — but today's methods use reinforcement learning, which only sees a pass/fail score. GEPA reads the full execution trace instead, and evolves prompts through genetic search.

The Problem
Blind Optimization
Current methods run the LLM thousands of times, score each attempt with a single number, and nudge the prompt toward higher scores. All the rich detail of what happened — errors, reasoning, partial progress — gets collapsed into one scalar. The optimizer never sees why something failed.
error_msg
trace_log
reasoning
tool_output
partial_result
r = 0.34
5,000–25,000LLM calls needed
ScalarSignal type
✗ Blind to failure modes ✗ Massively wasteful
What if the optimizer could read?
The Insight
Read the Traces
Instead of a single score, keep the full execution log — errors, reasoning chains, tool outputs — and feed it back as natural-language feedback. Think of it like "text gradients": structured hints that tell the optimizer which direction to push the prompt.
RL Sees
0.34
GEPA Sees
step 3: tool timeout
cause: malformed query
trace: parse → fail @ arg2
fix: add type check
reflect: prompt too vague
Full tracesSignal quality
Text gradientsFeedback type
✓ Sees why things fail ✓ Actionable feedback
Traces inform mutations —
but how to search the prompt space?
Evolution Loop
Genetic-Pareto Search
Two models divide the work: a cheap, fast model ("Student") runs tasks, while a stronger model ("Reflection") reads the traces and rewrites the prompt. Instead of converging on one best prompt, GEPA keeps a Pareto frontier — a set of diverse, non-dominated candidates that trade off different quality dimensions.
Pareto Frontier
best-so-far prompts
Student LM: Execute
fast & cheap, 95%
Reflection LM: Analyze
strong & rare, 5%
Mutate Prompt
Update Frontier
↺ repeat
95% / 5%Student-to-Reflection ratio
~150LLM calls to converge
✓ Cheap execution ✓ Smart reflection ✓ Diverse solutions
Pareto diversity + trace-guided mutation =
Results
35x More Efficient
GEPA matches or exceeds all baselines while using a fraction of the LLM calls.
GRPORL-based
25,000
MIPROv2search-based
~15,000
GEPAthis paper
~700
35xFewer LLM calls
+13%Over MIPROv2
9.2xShorter prompts
✓ ICLR 2026 Oral ✓ State of the art
Paper
GEPA: Genetic-Pareto Prompt Optimization
ICLR 2026 Oral
Replaces RL's scalar rewards with full execution trace reflection, achieving 35x efficiency gains and +13% over SOTA with 9.2x shorter prompts.
← Visual Notes