v0.2.0 · Now with SkillOpt

Open Source CLI · MIT

CodexOpt

Bring Microsoft Research's SkillOpt to Codex. Optimize AGENTS.md and SKILL.md with execution feedback.

Bounded edits, validation-gated acceptance, and Codex rollouts — turning intuition-driven prompt tweaks into measurable, reproducible gains.

$pip install codexopt==0.2.0

View on GitHub Read the SkillOpt story

New in v0.2.0

Microsoft SkillOpt, now for Codex

SkillOpt frames natural-language skill documents as optimizable external state for frozen models. An optimizer analyzes rollout trajectories, proposes controlled edits, and accepts changes only when they improve held-out validation tasks. CodexOpt brings that discipline straight into the Codex harness.

one command for Codex users

$uv run codexopt improve

Safe offline preview

$uv run codexopt improve --live

Codex-backed optimization

$uv run codexopt improve --live --apply

Apply validated changes

Offline preview is the default — Codex budget is used only with --live

What SkillOpt brings to your skills

Train / validation task splits mined from git history, issues, and skill descriptions

Bounded edits with a configurable edit budget — no prompt bloat

Validation-gated acceptance — a change wins only on held-out tasks

Tiered rewards: verifier → LLM judge → static fallback

Full codex exec --json trajectory parsing

Reports with accepted diffs and validation-score movement

SkillOpt, mapped to CodexOpt

Every SkillOpt concept has a concrete, Codex-native counterpart

Skill artifact

SKILL.md or AGENTS.md

Rollout

codex exec or command verifier

Feedback

Trajectory analysis + multi-signal scoring

Bounded edit

Edit budget + controlled modifications

Validation gate

Held-out task performance

Exported skill

Validated file diff with backups

Get Started in Seconds

Install with uv or pip, then run a single command

Install CodexOpt

From PyPI, or uv for the full workflow

pippip install codexopt==0.2.0

uvuv sync --extra dev

Improve, then apply

Preview offline, then opt into Codex

uv run codexopt improve

uv run codexopt improve --live --apply

Validated edits written with automatic backups

Everything in v0.2.0

A complete toolkit to measure and improve your Codex instruction assets

One-Command improve

Discover targets, mine tasks, optimize, preview, and apply — all from codexopt improve.

SkillOpt Engine

Train/validation splits, bounded edits, and validation-gated acceptance for SKILL.md.

Reflective Engine

SkillOpt/GEPA-inspired, Codex-backed reflection that rewrites only proven improvements.

Codex Rollout Parsing

Parses codex exec --json into trajectories: responses, commands, file changes, tokens, errors.

Task Mining

codexopt tasks init generates starter optimization tasks from git, skills, and issues.

Tiered Rewards

Verifier outcomes, LLM-judge feedback, and static analysis combined into one signal.

Validation-Gated Apply

Only held-out-validated edits are written, always with automatic backups.

Benchmark Scoring

Per-file 0–1 scores with criterion sub-scores and natural-language feedback.

Markdown Reporting

Reports showing files improved, accepted diffs, score movement, and fallback notes.

Three Optimization Engines

Pick the right engine — from local heuristics to Codex-backed SkillOpt

Heuristic Engine

Default · runs locally

Fast, deterministic optimization using rule-based transforms. No API keys or external calls. Perfect for CI/CD and quick iterations.

No API keys required

Deterministic results

Fast execution

CI/CD friendly

Reflective Engine

Maintained · Codex-backed

The maintained SkillOpt/GEPA-inspired path behind codexopt improve. Evaluates a candidate, captures feedback, rewrites, and keeps it only when held-out tasks improve.

Codex exec optimizer & judge

Textual feedback → mutation

Held-out validation gate

Tiered reward signals

SkillOpt Engine

SKILL.md engine

SkillOpt-style discipline for skills: task evidence becomes train/validation splits, candidates respect an edit budget, and acceptance needs a minimum validation delta.

Train / validation splits

Bounded edit budget

Validation-delta acceptance

Executable rollout rewards

Built on GEPA's reflective lineage

The SkillOpt paper benchmarks its approach against GEPA, TextGrad, and EvoSkill. CodexOpt's reflective engine carries forward GEPA's textual-reflection ideas in a streamlined, Codex-native implementation. The legacy --engine gepa path (which targeted the older gepa.optimize_anything API) is now deprecated and falls back with a clear warning — use --engine reflective instead.

Step-by-step control

Prefer the full pipeline?

When you want more control than improve, run each stage yourself

codexopt workflow

$uv run codexopt init

Initialize your project

$uv run codexopt scan

Scan for issues

$uv run codexopt benchmark

Score your instruction files

$uv run codexopt tasks init

Mine starter optimization tasks

$uv run codexopt optimize skills --engine reflective

Optimize SKILL.md files

$uv run codexopt apply --kind skills --dry-run

Preview the apply impact

$uv run codexopt report --output codexopt-report.md

Generate a report

Resources

Everything you need to get started

SkillOpt Launch Post

How CodexOpt 0.2.0 brings Microsoft SkillOpt to Codex.

Documentation

Full guides, Codex workflow, and usage examples.

GitHub Repository

Source code, issues, and contributions.

PyPI Package

Install codexopt==0.2.0 from PyPI.

SkillOpt Paper

Microsoft Research's SkillOpt on arXiv.

Demo Repository

Example project showing CodexOpt in action.

Optimize Your Codex Skills Today

CodexOpt 0.2.0 makes SkillOpt-style optimization practical for Codex users — rigorous validation with direct harness integration. Open source and MIT licensed.

Star on GitHub Read the launch post