v0.2.0 · Now with SkillOpt
Open Source CLI · MIT
CodexOpt Logo

CodexOpt

Bring Microsoft Research's SkillOpt to Codex. Optimize AGENTS.md and SKILL.md with execution feedback.

Bounded edits, validation-gated acceptance, and Codex rollouts — turning intuition-driven prompt tweaks into measurable, reproducible gains.

$pip install codexopt==0.2.0
New in v0.2.0

Microsoft SkillOpt, now for Codex

SkillOpt frames natural-language skill documents as optimizable external state for frozen models. An optimizer analyzes rollout trajectories, proposes controlled edits, and accepts changes only when they improve held-out validation tasks. CodexOpt brings that discipline straight into the Codex harness.

one command for Codex users
$uv run codexopt improve

Safe offline preview

$uv run codexopt improve --live

Codex-backed optimization

$uv run codexopt improve --live --apply

Apply validated changes

Offline preview is the default — Codex budget is used only with --live

What SkillOpt brings to your skills

Train / validation task splits mined from git history, issues, and skill descriptions
Bounded edits with a configurable edit budget — no prompt bloat
Validation-gated acceptance — a change wins only on held-out tasks
Tiered rewards: verifier → LLM judge → static fallback
Full codex exec --json trajectory parsing
Reports with accepted diffs and validation-score movement

SkillOpt, mapped to CodexOpt

Every SkillOpt concept has a concrete, Codex-native counterpart

Skill artifact

SKILL.md or AGENTS.md

Rollout

codex exec or command verifier

Feedback

Trajectory analysis + multi-signal scoring

Bounded edit

Edit budget + controlled modifications

Validation gate

Held-out task performance

Exported skill

Validated file diff with backups

Get Started in Seconds

Install with uv or pip, then run a single command

1

Install CodexOpt

From PyPI, or uv for the full workflow

pippip install codexopt==0.2.0
uvuv sync --extra dev
2

Improve, then apply

Preview offline, then opt into Codex

uv run codexopt improve
uv run codexopt improve --live --apply
Validated edits written with automatic backups

Everything in v0.2.0

A complete toolkit to measure and improve your Codex instruction assets

One-Command improve

Discover targets, mine tasks, optimize, preview, and apply — all from codexopt improve.

SkillOpt Engine

Train/validation splits, bounded edits, and validation-gated acceptance for SKILL.md.

Reflective Engine

SkillOpt/GEPA-inspired, Codex-backed reflection that rewrites only proven improvements.

Codex Rollout Parsing

Parses codex exec --json into trajectories: responses, commands, file changes, tokens, errors.

Task Mining

codexopt tasks init generates starter optimization tasks from git, skills, and issues.

Tiered Rewards

Verifier outcomes, LLM-judge feedback, and static analysis combined into one signal.

Validation-Gated Apply

Only held-out-validated edits are written, always with automatic backups.

Benchmark Scoring

Per-file 0–1 scores with criterion sub-scores and natural-language feedback.

Markdown Reporting

Reports showing files improved, accepted diffs, score movement, and fallback notes.

Three Optimization Engines

Pick the right engine — from local heuristics to Codex-backed SkillOpt

Heuristic Engine

Default · runs locally

Fast, deterministic optimization using rule-based transforms. No API keys or external calls. Perfect for CI/CD and quick iterations.

No API keys required
Deterministic results
Fast execution
CI/CD friendly

Reflective Engine

Maintained · Codex-backed

The maintained SkillOpt/GEPA-inspired path behind codexopt improve. Evaluates a candidate, captures feedback, rewrites, and keeps it only when held-out tasks improve.

Codex exec optimizer & judge
Textual feedback → mutation
Held-out validation gate
Tiered reward signals

SkillOpt Engine

SKILL.md engine

SkillOpt-style discipline for skills: task evidence becomes train/validation splits, candidates respect an edit budget, and acceptance needs a minimum validation delta.

Train / validation splits
Bounded edit budget
Validation-delta acceptance
Executable rollout rewards

Built on GEPA's reflective lineage

The SkillOpt paper benchmarks its approach against GEPA, TextGrad, and EvoSkill. CodexOpt's reflective engine carries forward GEPA's textual-reflection ideas in a streamlined, Codex-native implementation. The legacy --engine gepa path (which targeted the older gepa.optimize_anything API) is now deprecated and falls back with a clear warning — use --engine reflective instead.

Step-by-step control

Prefer the full pipeline?

When you want more control than improve, run each stage yourself

codexopt workflow
$uv run codexopt init
$uv run codexopt scan
$uv run codexopt benchmark
$uv run codexopt tasks init
$uv run codexopt optimize skills --engine reflective
$uv run codexopt apply --kind skills --dry-run
$uv run codexopt report --output codexopt-report.md

Optimize Your Codex Skills Today

CodexOpt 0.2.0 makes SkillOpt-style optimization practical for Codex users — rigorous validation with direct harness integration. Open source and MIT licensed.