
CodexOpt
Bring Microsoft Research's SkillOpt to Codex. Optimize AGENTS.md and SKILL.md with execution feedback.
Bounded edits, validation-gated acceptance, and Codex rollouts — turning intuition-driven prompt tweaks into measurable, reproducible gains.
pip install codexopt==0.2.0Microsoft SkillOpt, now for Codex
SkillOpt frames natural-language skill documents as optimizable external state for frozen models. An optimizer analyzes rollout trajectories, proposes controlled edits, and accepts changes only when they improve held-out validation tasks. CodexOpt brings that discipline straight into the Codex harness.
uv run codexopt improveSafe offline preview
uv run codexopt improve --liveCodex-backed optimization
uv run codexopt improve --live --applyApply validated changes
What SkillOpt brings to your skills
SkillOpt, mapped to CodexOpt
Every SkillOpt concept has a concrete, Codex-native counterpart
SKILL.md or AGENTS.md
codex exec or command verifier
Trajectory analysis + multi-signal scoring
Edit budget + controlled modifications
Held-out task performance
Validated file diff with backups
Get Started in Seconds
Install with uv or pip, then run a single command
Install CodexOpt
From PyPI, or uv for the full workflow
pip install codexopt==0.2.0uv sync --extra devImprove, then apply
Preview offline, then opt into Codex
uv run codexopt improveuv run codexopt improve --live --applyEverything in v0.2.0
A complete toolkit to measure and improve your Codex instruction assets
One-Command improve
Discover targets, mine tasks, optimize, preview, and apply — all from codexopt improve.
SkillOpt Engine
Train/validation splits, bounded edits, and validation-gated acceptance for SKILL.md.
Reflective Engine
SkillOpt/GEPA-inspired, Codex-backed reflection that rewrites only proven improvements.
Codex Rollout Parsing
Parses codex exec --json into trajectories: responses, commands, file changes, tokens, errors.
Task Mining
codexopt tasks init generates starter optimization tasks from git, skills, and issues.
Tiered Rewards
Verifier outcomes, LLM-judge feedback, and static analysis combined into one signal.
Validation-Gated Apply
Only held-out-validated edits are written, always with automatic backups.
Benchmark Scoring
Per-file 0–1 scores with criterion sub-scores and natural-language feedback.
Markdown Reporting
Reports showing files improved, accepted diffs, score movement, and fallback notes.
Three Optimization Engines
Pick the right engine — from local heuristics to Codex-backed SkillOpt
Heuristic Engine
Default · runs locally
Fast, deterministic optimization using rule-based transforms. No API keys or external calls. Perfect for CI/CD and quick iterations.
Reflective Engine
Maintained · Codex-backed
The maintained SkillOpt/GEPA-inspired path behind codexopt improve. Evaluates a candidate, captures feedback, rewrites, and keeps it only when held-out tasks improve.
SkillOpt Engine
SKILL.md engine
SkillOpt-style discipline for skills: task evidence becomes train/validation splits, candidates respect an edit budget, and acceptance needs a minimum validation delta.
Built on GEPA's reflective lineage
The SkillOpt paper benchmarks its approach against GEPA, TextGrad, and EvoSkill. CodexOpt's reflective engine carries forward GEPA's textual-reflection ideas in a streamlined, Codex-native implementation. The legacy --engine gepa path (which targeted the older gepa.optimize_anything API) is now deprecated and falls back with a clear warning — use --engine reflective instead.
Prefer the full pipeline?
When you want more control than improve, run each stage yourself
uv run codexopt inituv run codexopt scanuv run codexopt benchmarkuv run codexopt tasks inituv run codexopt optimize skills --engine reflectiveuv run codexopt apply --kind skills --dry-runuv run codexopt report --output codexopt-report.mdResources
Everything you need to get started
Optimize Your Codex Skills Today
CodexOpt 0.2.0 makes SkillOpt-style optimization practical for Codex users — rigorous validation with direct harness integration. Open source and MIT licensed.
