Updated May 2, 2026
OpenAI Codex guide: cloud, CLI, and code review
A practical guide to OpenAI Codex across cloud tasks, the local CLI, GitHub review, AGENTS.md, sandboxing, and team rollout.
OpenAI Codex should be on the first shortlist for serious agentic coding in 2026. The product is no longer only a local terminal assistant. The docs describe a broader coding system: cloud tasks that run in isolated containers, a Rust CLI for local repo work, an IDE extension, GitHub pull request review, subagents, MCP, skills, non-interactive automation, and shared configuration.
That breadth is why Codex belongs near the top of a practical ranking. A developer can use the same agent family for supervised local edits, background cloud work, and review before merge. A team can put repo instructions in AGENTS.md, tune sandbox and approval settings, connect MCP tools, and move repeated playbooks into skills. Few tools cover that much of the development loop.
Where Codex Fits
| Surface | Use it for | Watch carefully |
|---|---|---|
| Codex web | Issue-shaped tasks, background branches, parallel work, pull requests | Environment setup, repo permissions, task size |
| Codex CLI | Local edits, tests, refactors, code review before commit | Sandbox mode, approval policy, dirty Git state |
| IDE extension | Starting cloud tasks from the editor and applying diffs locally | Context handoff and local verification |
| GitHub review | Focused review comments on serious PR risks | Repository setup and review guidelines |
| Non-interactive mode | Repeatable scripts, CI helpers, batch maintenance | Prompt shape, logs, credentials |
Codex is strongest when the work has a clear repository target and a proof step. Good tasks look like:
- Add regression tests for a known bug and fix the smallest failing path.
- Modernize one module without changing public APIs.
- Review a pull request for security regressions.
- Draft documentation for a changed subsystem and verify links.
- Split a refactor into milestones, then delegate one milestone to cloud.
Codex is weaker when the work is mostly product taste, visual exploration, or ambiguous strategy. It can help, but a tool like v0, Lovable, or Cursor may be a better first surface depending on the job.
Setup Choices That Matter
Codex cloud tasks run in a task-specific cloud environment. The environment should know the repository, setup commands, runtime dependencies, and any tools needed to build or test. If the environment is wrong, the agent will waste time debugging missing packages instead of solving the actual task.
Secrets need special treatment. Codex cloud environments support environment variables and secrets, but the docs distinguish them: secrets are encrypted and available to setup scripts, then removed before the agent phase. That is the safer default for package registry tokens, private dependency access, and install-time credentials.
Internet access is useful but not free from risk. OpenAI’s docs call out prompt injection risk when an agent reads untrusted web pages, issue text, package READMEs, or dependency content. For cloud tasks, prefer allowlisted domains and restrict HTTP methods when possible. A task that only needs package downloads does not need unrestricted outbound access.
For the CLI, start with conservative approvals. Codex exposes sandbox and approval knobs through configuration. New users should keep defaults tight, then loosen them for trusted repositories after the workflow is proven. Local agents can run commands, edit files, and touch a lot of code quickly; the permission model is part of the product, not an annoying setup chore.
Write AGENTS.md Before the Big Trial
Codex gets better when the repository tells it how work is done. A useful AGENTS.md should include:
- Repo layout and important directories.
- Build, test, lint, and typecheck commands.
- Code style and architecture rules.
- PR expectations and review checklist.
- Files or behaviors the agent should avoid.
- A definition of done for common tasks.
Keep it short. If instructions grow into a manual, keep the root file focused and link out to task-specific docs. The OpenAI best-practices docs recommend updating AGENTS.md when Codex repeats the same mistake. Treat the file like team onboarding material for agents.
A First Week Trial
Use a real repository and keep the tasks small enough to review.
| Day | Trial | What to measure |
|---|---|---|
| 1 | Run Codex CLI on a tiny bug fix | Diff size, test command, number of corrections |
| 2 | Ask /review to inspect uncommitted work | Useful findings, false positives, missed risks |
| 3 | Create or tune AGENTS.md | Whether Codex follows project commands |
| 4 | Delegate one cloud task | Setup time, branch quality, PR readiness |
| 5 | Ask @codex review on a PR | Comment signal, priority, author reaction |
Use the same scoring sheet for every task:
- Did Codex name the files it changed?
- Did it explain why the change is small enough?
- Did it run the right checks?
- Did the final diff match the request?
- Did review become easier or harder?
Prompt Patterns That Work
For local fixes:
Reproduce the failing auth test, identify the smallest server-side cause, patch it, add a regression test, and run the auth test file. Do not change the database schema.
For cloud delegation:
Implement Milestone 1 only from the plan in this thread. Keep public APIs stable, update tests, and leave notes on anything you could not verify in the cloud environment.
For PR review:
@codex review for tenant-boundary regressions, accidental PII logging, and missing authorization checks. Ignore formatting issues covered by lint.
The key is to define proof. A prompt without a verification step invites a confident patch. A prompt with commands, constraints, and a review target gives Codex a finish line.
When Codex Should Rank First
Rank Codex first when the buyer wants one agent system for repo work across local, cloud, review, and automation. It is especially strong for teams already using ChatGPT plans, GitHub pull requests, and testable issue-shaped work.
Rank Claude Code first when the team wants the most polished terminal-first agent experience and values plan mode, checkpoints, memory, and local working style more than cloud delegation. Rank Cursor first when the main job is everyday editor flow. Rank Lovable or Bolt first when the job is a product prototype rather than a maintainable repo.
Codex earns the top slot when “best AI coding tool” means the broadest engineering surface, not the prettiest editor.