Claude Code, GSD, Plan Mode, and Codex Review Council — Research-Grade Playbook and Handoff
Date: 2026-03-19
Prepared for: Dmitri / Solanasis context
Scope: This document extracts, verifies, organizes, and improves the key parts of the discussion about:
- Claude Code Plan Mode vs. GSD (Get Shit Done)
- How to use Codex as a reviewer or “model council” alongside Claude Code / GSD
Executive Summary
This discussion was fundamentally about how to get stronger planning and execution quality on large software efforts—especially work that starts from a large PRD, needs multi-step implementation, and benefits from independent review.
Bottom line
- Verified: Claude Code Plan Mode is a native, read-only planning workflow for safe codebase analysis and proposal generation before edits. It is not, by itself, a full autonomous orchestration system. See [R1].
- Verified: GSD is a third-party workflow/framework that adds a spec-driven, phase-based development process on top of coding agents. The current repo explicitly supports multiple runtimes, including Claude Code and Codex. See [R7], [R8].
- Verified: GSD’s documented flow includes commands such as
/gsd:map-codebase,/gsd:new-project,/gsd:discuss-phase,/gsd:plan-phase,/gsd:execute-phase,/gsd:verify-work, and/gsd:ship. See [R7]. - Verified: Claude Code has native extension/orchestration primitives that matter for this use case: subagents, agent teams, hooks, permissions, and sandboxing. See [R2], [R3], [R4], [R5], [R6].
- Verified: Codex has native primitives that matter for this use case: interactive CLI, non-interactive
codex exec, review modes, AGENTS.md, GitHub Action, and MCP/Agents SDK integration. See [R9], [R10], [R11], [R12], [R13], [R14], [R15]. - Verified with caution: No official first-party documentation was found showing a built-in “Claude Code and Codex directly debate each other” feature. The docs support automation and orchestration, but not a one-click cross-vendor debate mode. This is absence-of-evidence, not proof that no integrations exist anywhere. See [R1]–[R6], [R9]–[R15].
- Recommended conclusion: The strongest practical pattern is not “two models freestyle arguing.” It is:
- one model creates the canonical plan,
- the other performs a structured adversarial review,
- findings are saved to files,
- the plan is reconciled,
- implementation proceeds,
- the reviewer runs again on the diff / branch.
- Recommended stack: For advanced work, use Claude Code (or GSD on Claude Code) for plan authoring, and Codex for adversarial plan review + implementation review. Use file-based artifacts, hooks, worktrees, and optionally CI to turn this into a repeatable workflow.
Purpose of This Document
This artifact is intended to serve as a:
- Guide for how Claude Code, GSD, and Codex relate
- Playbook for setting up a practical “model council” review workflow
- Briefing memo for decision-making
- Handoff document for another AI so it can continue the work without the original chat
This is not a light recap. It explicitly separates:
- what is verified
- what was user-stated
- what was assistant-stated but unverified
- what remains tentative / speculative
Discussion Context
User goals from the discussion
- User-stated: The user wants to understand the difference between using the GSD repo/workflow and using Claude Code Plan Mode.
- User-stated: The user is interested in advanced planning for large/complex work, especially breaking down a large PRD into something closer to a repeatable, partially autonomous implementation workflow.
- User-stated: The user wants to know whether there is a way to have Claude Code and Codex review plans back and forth, or otherwise simulate a model council because each model appears to catch issues the other misses.
- User-stated: The user is specifically looking for real-world workarounds / hacks if there is no built-in feature.
Important assumptions and preferences inferred from the discussion
- User-stated / inferred: The user values high-quality planning, multi-agent workflows, durable artifacts, and independent review, not just “one-shot vibe coding.”
- User-stated / inferred: The user is willing to adopt tooling and automation if it materially improves planning rigor and review quality.
- Assistant-stated but reasonable: The best solution should minimize “prompt theater” and maximize repeatability, traceability, and fresh-context execution.
Key Facts and Verified Findings
1) Claude Code Plan Mode
- Verified: Claude Code docs state that Plan Mode “instructs Claude to create a plan by analyzing the codebase with read-only operations,” and it is meant for exploring codebases, planning complex changes, or reviewing code safely. [R1]
- Verified: In Plan Mode, Claude uses
AskUserQuestionto gather requirements and clarify goals before proposing a plan. [R1] - Interpretation: Plan Mode is a safe planning mode, not a complete project orchestration framework.
What that means operationally
- Verified + interpretation: Plan Mode is best thought of as a native planning primitive, not a project manager, state machine, or autonomous build system. The docs support planning, not full lifecycle orchestration. [R1]
- Tentative / speculative: Plan Mode by itself is unlikely to be enough for a very large PRD unless paired with external structure (files, milestones, subagents, separate sessions, or another framework). This is an inference from the tooling model, not an explicit vendor statement.
2) Claude Code native primitives relevant to advanced planning
Subagents
- Verified: Claude Code supports custom subagents for task-specific workflows and improved context management. Each subagent runs in its own context window with custom system prompt, tool access, and permissions. [R2]
- Verified: Subagents are suitable when work is decomposable into specialized, isolated tasks. [R2]
Agent teams
- Verified: Claude Code supports agent teams in which multiple Claude Code instances work together, each in its own context window, with a lead coordinating work. [R3]
- Verified: Agent teams are especially suited to research/review, parallel exploration, debugging with competing hypotheses, and cross-layer coordination. [R3]
- Verified: Agent teams add coordination overhead and use significantly more tokens than a single session. [R3]
Hooks
- Verified: Claude Code supports hooks, which are user-defined shell commands that execute at specific lifecycle points. Hooks provide deterministic control and can be used to enforce rules, automate repetitive tasks, and integrate other tools. [R4]
- Verified: Claude docs explicitly distinguish deterministic hooks from judgment-based evaluation, and they point to prompt-based or agent-based hooks when judgment is needed. [R4]
Permissions and safety
- Verified: Claude Code uses a tiered permission system with allow/ask/deny rules, and deny takes precedence. [R5]
- Verified: Claude Code also supports sandboxing, which can reduce prompts while constraining file system and network access. [R6]
Why this matters
- Verified + interpretation: Claude Code already has many of the primitives needed to build a serious planning-and-review pipeline: planning mode, subagents, agent teams, hooks, permissions, and sandboxing. [R1]–[R6]
3) What GSD is, according to its current documentation
- Verified: The current GSD repo describes itself as “a light-weight and powerful meta-prompting, context engineering and spec-driven development system” for Claude Code, OpenCode, Gemini CLI, Codex, Copilot, and Antigravity. [R7]
- Verified: The installer lets you choose runtime(s), including Claude Code and Codex. [R7]
- Verified: The repo says Codex installation uses skills rather than custom prompts. [R7]
- Verified: GSD explicitly frames itself as solving “context rot” and providing a spec-driven workflow. [R7]
GSD workflow and commands
- Verified: GSD documents the following key lifecycle:
/gsd:new-project/gsd:discuss-phase/gsd:plan-phase/gsd:execute-phase/gsd:verify-work/gsd:ship/gsd:next/gsd:map-codebasefor brownfield/existing repos
[R7]
- Verified: GSD says
/gsd:map-codebaseshould be run first on an existing codebase so planning can use the existing stack, conventions, and architecture. [R7] - Verified: GSD says
/gsd:plan-phaseresearches, creates 2–3 atomic task plans, and verifies them against requirements. [R7] - Verified: GSD says
/gsd:execute-phaseruns plans in waves, uses fresh context per plan, and creates atomic commits. [R7] - Verified: GSD says
/gsd:verify-workis the human verification / UAT step. [R7] - Verified: GSD says
/gsd:shipcreates a PR from verified phase work. [R7]
GSD workflow agents and automation
- Verified: GSD exposes settings such as
workflow.research,workflow.plan_check,workflow.verifier, andworkflow.auto_advance. [R7] - Verified: GSD notes these improve quality but add tokens and time. [R7]
GSD permissions stance
- Verified: GSD documents a recommended usage pattern of running Claude Code with
claude --dangerously-skip-permissions. [R7] - Verified: GSD also provides an alternative based on granular permissions. [R7]
Important nuance
- Verified + caution: GSD is a third-party workflow/framework, not a native Claude Code feature. [R7]
- Assistant-stated but unverified: GSD is better understood as a “structure layer” on top of agent runtimes, not a full standalone multi-model orchestrator. This conclusion is reasonable from the docs but not stated by the maintainers in those exact words.
4) Evidence that GSD now supports Codex-oriented execution paths
- Verified: The GSD changelog/release output includes “
gsd-autonomousskill for Codex runtime — enables autonomous GSD execution.” [R8] - Verified: The repo and release materials support the claim that GSD now meaningfully targets Codex, not only Claude Code. [R7], [R8]
- Caution: This does not by itself prove a mature Claude↔Codex council system exists inside GSD. It only proves GSD has runtime support and skills for Codex.
5) What Codex officially supports that is relevant here
CLI and local agent behavior
- Verified: Codex CLI is a local coding agent that can inspect repositories, edit files, and run commands. [R9]
Non-interactive scripting
- Verified: OpenAI documents non-interactive mode using
codex exec, explicitly for scripts and CI. [R10]
Review workflows
- Verified: Codex supports
/review, including:- review against a base branch,
- review uncommitted changes,
- review a commit,
- use custom review instructions.
[R11]
Durable instruction files
- Verified: Codex reads
AGENTS.mdfiles before doing work and supports layered instruction discovery from global and project scopes. [R12] - Verified: Codex can also use team review guidance files referenced from
AGENTS.mdto make review behavior more consistent. [R11], [R12]
Durable planning files
- Verified: OpenAI’s Codex cookbook recommends a durable execution-plan artifact (
PLANS.md/ ExecPlans), written such that a coding agent can succeed with just the working tree and the plan file. [R13]
CI automation
- Verified: OpenAI documents a Codex GitHub Action that can run
codex execin CI/CD, apply patches, or post reviews. [R14]
MCP / orchestration
- Verified: OpenAI documents running Codex as an MCP server and orchestrating it via the Agents SDK to create deterministic, reviewable multi-agent workflows. [R15]
6) Verified conclusions from the above
- Verified: Both Claude Code and Codex now expose enough automation primitives to support a file-based, reviewable workflow rather than only manual chat. [R1]–[R6], [R9]–[R15]
- Verified: Claude Code is stronger on native codebase exploration/orchestration primitives within its own ecosystem. [R1]–[R6]
- Verified: Codex is stronger than older casual tools in terms of explicit scripting/CI/review-oriented documentation and durable artifacts such as
AGENTS.md,PLANS.md, andcodex exec. [R10]–[R15] - Verified with caution: No official doc was found describing a built-in Claude↔Codex “conversation” or “model council” feature. This means the current best-known path is still integration by scripts, files, hooks, worktrees, or CI. [R1]–[R6], [R9]–[R15]
Major Decisions and Conclusions
Decision 1: Treat Plan Mode and GSD as different layers
- Verified: Plan Mode is native Claude Code functionality. [R1]
- Verified: GSD is a third-party workflow framework layered on top of runtimes including Claude Code and Codex. [R7]
- Conclusion: Do not think of GSD as “Claude Code Plan Mode but better.” They solve different problems:
- Plan Mode = native safe planning
- GSD = opinionated structured workflow / phase management / context discipline
Decision 2: Do not rely on giant one-shot planning without durable artifacts
- Verified: Codex documentation explicitly recommends durable plan artifacts (
PLANS.md/ ExecPlans). [R13] - Verified: Claude Code docs emphasize separate contexts via subagents/teams and safe planning before edits. [R1]–[R3]
- Conclusion: The better pattern is many small well-documented shots, not one giant fuzzy shot.
Decision 3: Use one model as plan author and the other as reviewer
- Assistant recommendation, grounded but not directly vendor-prescribed: The cleanest workflow is:
- Claude Code or GSD/Claude authors the canonical plan
- Codex performs adversarial review
- Claude reconciles
- Claude implements
- Codex reviews the resulting diff
Decision 4: File-based workflow beats chat-only workflow
- Verified: Both Claude and Codex ecosystems support persistent instruction and workflow artifacts. [R4], [R12], [R13]
- Conclusion: Plans, reviews, reconciliations, and approvals should live in files, not only in terminal history.
Reasoning, Tradeoffs, and Why It Matters
Why the user’s “model council” instinct is directionally correct
- User-stated: The user observes that Codex appears to find issues in Claude plans and vice versa.
- Community anecdote / weak evidence: Reddit threads show users informally reporting that using both tools together can catch more issues than relying on one alone. [R16], [R17]
- Caution: These are community anecdotes, not controlled evidence. They are useful as signals, not proof.
Why a council can help
- Different models often have:
- different failure modes
- different blind spots
- different tendencies around overconfidence, assumptions, and edge cases
Why a freeform council can go wrong
- It can become:
- expensive
- repetitive
- hard to audit
- vulnerable to each model drifting from the actual requirements
- vulnerable to “review theater” instead of real challenge
Stronger framing
The best version is an adversarial review pipeline, not an endless debate.
Recommended Playbook / Process
Recommended operating model
Option A — Best balance for current real-world use
Claude Code (possibly with GSD) as planner/executor + Codex as reviewer
This is the recommended default.
Step 0: Prepare durable repo instructions
Create the following files:
AGENTS.md— shared repo instructionsPLANS.md— plan template / execution-plan standardscode_review.md— explicit review rubricdocs/ai-workflow.md— house rules for the Claude↔Codex workflowREVIEWS/— all structured review outputs live here
Step 1: Explore and scope in Claude Code
Use Plan Mode or GSD + /gsd:map-codebase on brownfield repos.
Recommended for existing repos
- Run Claude Code Plan Mode to explore
- If using GSD, run
/gsd:map-codebase - Then run
/gsd:new-projector phase setup as appropriate
Step 2: Create the canonical plan
Have Claude/GSD produce a plan artifact such as:
plans/phase-01-plan.md- or an ExecPlan inside
PLANS.md-style conventions
The plan must contain at minimum:
- objective
- assumptions
- scope / non-scope
- data model changes
- migrations
- rollback plan
- test plan
- observability/logging impacts
- security/privacy concerns
- acceptance criteria
- implementation sequence
- known risks
Step 3: Run Codex adversarial plan review
Run Codex using codex exec or /review conventions against the plan file, not only the chat.
Recommended prompt shape
Ask Codex to review for:
- missing assumptions
- architecture risks
- hidden dependencies
- migration/rollback gaps
- test coverage gaps
- security/privacy issues
- ambiguous requirements
- unnecessary complexity
- mismatch between PRD and plan
- likely implementation failure points
Required output structure
Force a rigid schema such as:
{
"verdict": "approve|revise|block",
"critical_issues": [],
"important_issues": [],
"nice_to_have": [],
"requirements_gaps": [],
"security_privacy_flags": [],
"migration_and_rollback_flags": [],
"test_strategy_gaps": [],
"suggested_revisions": []
}Step 4: Reconcile in Claude
Claude must process each Codex issue with one of:
- Accepted
- Rejected with reason
- Deferred with reason
Save that to:
REVIEWS/phase-01-reconciliation.md
Step 5: Execute in Claude / GSD
Run implementation only after the plan is reconciled.
If using GSD:
/gsd:plan-phase 1/gsd:execute-phase 1/gsd:verify-work 1
Step 6: Run Codex implementation review
Run Codex review against:
- base branch
- uncommitted changes
- specific commit
Use code_review.md rubric through AGENTS.md if possible. [R11], [R12]
Step 7: Human approval gate
Human approves only after:
- plan review passed
- implementation review passed
- UAT / verify-work passed
Option B — Semi-automated council via Claude hooks
Use Claude hooks to call Codex automatically after a plan file is written.
Why this is attractive
- Verified: Claude hooks can run user-defined shell commands at lifecycle points. [R4]
- Verified: Codex can run non-interactively with
codex exec. [R10]
Example pattern
- Claude writes
plans/phase-01-plan.md - Hook triggers:
codex exec --json ...
- Output saved to:
REVIEWS/phase-01-codex-review.json
- Claude reads the review and revises the plan
Caveat
- Assistant-stated but strongly recommended: Do not automatically allow Codex review output to silently mutate the canonical plan. Keep the review separate and explicit.
Option C — CI gate
Use CI to require review artifacts before execution or merge.
Why this is attractive
- Verified: OpenAI documents a Codex GitHub Action suitable for CI/CD review/pipeline work. [R14]
Suggested gates
- PR cannot merge unless:
- plan artifact exists
- Codex review artifact exists
- reconciliation artifact exists
- implementation review passes
- tests pass
This is the most “real software process” version.
Option D — Full orchestration / research path
If you later want deeper automation:
- Verified: Codex can run as an MCP server and be orchestrated with the OpenAI Agents SDK. [R15]
- Tentative / speculative: A future state could use Claude for authoring and Codex as an MCP-driven reviewer inside a custom orchestrator. This is feasible in principle from the docs, but not a turnkey off-the-shelf Claude↔Codex council product.
Practical File and Folder Structure
.agent/
PLANS.md
code_review.md
workflow-notes.md
plans/
phase-01-plan.md
phase-02-plan.md
reviews/
phase-01-codex-review.json
phase-01-codex-review.md
phase-01-reconciliation.md
phase-01-implementation-review.md
docs/
ai-workflow.md
architecture.md
AGENTS.md
CLAUDE.mdNotes
- Verified: Codex reads
AGENTS.md. [R12] - Assistant-stated but useful: Keep
CLAUDE.mdandAGENTS.mdaligned so the two tools do not inherit contradictory guidance.
Suggested Review Rubric
Use the following categories consistently:
- Requirements fidelity
- Architecture correctness
- Data model / schema impact
- Migration strategy
- Rollback / recovery strategy
- Testing strategy
- Observability / logging
- Security / privacy / secrets handling
- Performance / scale considerations
- Developer ergonomics / maintenance burden
- Scope creep / unnecessary complexity
- Deployment and release risk
Why this matters
- Assistant-stated but sound: Most weak “AI review” setups fail because the reviewer is asked to “review this” instead of being forced through a rubric.
Tools, Resources, Links, and References
Primary official references
-
[R1] Claude Code Common Workflows (Plan Mode):
https://code.claude.com/docs/en/common-workflows -
[R2] Claude Code Subagents:
https://code.claude.com/docs/en/sub-agents -
[R3] Claude Code Agent Teams:
https://code.claude.com/docs/en/agent-teams -
[R4] Claude Code Hooks Guide:
https://code.claude.com/docs/en/hooks-guide -
[R5] Claude Code Permissions:
https://code.claude.com/docs/en/permissions -
[R6] Claude Code Sandboxing:
https://code.claude.com/docs/en/sandboxing -
[R7] GSD repository:
https://github.com/gsd-build/get-shit-done -
[R8] GSD changelog / release evidence:
https://github.com/gsd-build/get-shit-done/blob/main/CHANGELOG.md -
[R9] Codex CLI overview:
https://developers.openai.com/codex/cli/ -
[R10] Codex non-interactive mode (
codex exec):
https://developers.openai.com/codex/noninteractive/ -
[R11] Codex CLI features / review workflows:
https://developers.openai.com/codex/cli/features
and
https://developers.openai.com/codex/learn/best-practices/ -
[R12] Codex
AGENTS.mdguidance:
https://developers.openai.com/codex/guides/agents-md/ -
[R13] Codex ExecPlans /
PLANS.md:
https://developers.openai.com/cookbook/articles/codex_exec_plans/ -
[R14] Codex GitHub Action:
https://developers.openai.com/codex/github-action/ -
[R15] Codex MCP / Agents SDK integration:
https://developers.openai.com/codex/guides/agents-sdk/
Community / anecdotal references (use cautiously)
-
[R16] Reddit anecdote: users combining Claude Code + Codex
https://www.reddit.com/r/ClaudeCode/comments/1rh0kuo/anyone_else_using_claude_code_codex_together_way/ -
[R17] Reddit anecdote: plan file passed from Claude to Codex and back
https://www.reddit.com/r/AI_Agents/comments/1rbvvr5/claude_code_and_codex_working_on_implementation/
Evidence note
[R16] and [R17] are community anecdotes. They can inspire workflow ideas, but they should not be treated as authoritative evidence.
Risks, Caveats, and Red Flags
1) Security and permissions risk
- Verified: GSD recommends
claude --dangerously-skip-permissions. [R7] - Risk: This increases autonomy but also risk. On a sensitive repo or system, this can be reckless if used casually.
- Safer alternative: Use Claude permissions and/or sandboxing more deliberately. [R5], [R6]
2) Secret leakage risk
- Verified: Claude has deny rules and sandbox controls. [R5], [R6]
- Risk: If Claude, Codex, hooks, or CI jobs can read
.env, credentials, tokens, or production secrets, your “review council” can become a security incident. - Recommendation: Explicitly deny access to sensitive paths and keep review prompts scoped.
3) Tooling drift risk
- Verified: GSD is moving quickly and adding features rapidly. [R8]
- Risk: Advice about exact commands, installation semantics, and runtime behavior can go stale fast.
- Recommendation: Re-check the official repo and changelog before implementation.
4) Over-automation risk
- Verified: Agent teams and workflow agents add overhead and token cost. [R3], [R7]
- Risk: It is easy to build an impressive but wasteful system where multiple agents all re-read the same repo and produce overlapping feedback.
- Recommendation: Keep roles distinct and outputs structured.
5) False confidence risk
- Assistant-stated but important: Two agreeing models do not prove correctness. Two disagreeing models do not prove the reviewer is right.
- Recommendation: Human gating still matters on architecture, migration, and security decisions.
6) “Debate theater” risk
- Assistant-stated but important: If the council is not constrained to files, rubrics, and explicit acceptance/rejection of findings, it will produce a lot of words and little signal.
7) Repo pollution risk
- Assistant-stated but important: Persistent plan/review files can clutter the repo if you do not define where they live and whether they are committed or ignored.
Important Missing Considerations Added in This Document
These points were not emphasized enough in the original discussion but should be included in any serious implementation:
A. Separate instructions for planning vs code review
- Verified: Codex supports layered instruction files through
AGENTS.md. [R12] - Recommendation: Keep review-specific rules in a dedicated file (
code_review.md) rather than stuffing everything into one giant instruction block.
B. Use worktrees or isolated branches for independent reviewers
- Assistant-stated but operationally strong: Independent reviewers are more useful when they are not inheriting the exact same terminal session assumptions. This point is aligned with the separate-context philosophy of both Claude subagents/teams and Codex sessions. [R2], [R3], [R15]
C. Force reconciliation
- Assistant-stated but strongly recommended: Never let plan-review findings remain implicit. Require an explicit reconciliation artifact that answers each major critique.
D. Distinguish plan review from code review
- Verified + interpretation: Codex has review tooling, but plan review is best implemented as a file-based
codex exectask or similar, whereas code review maps cleanly to built-in/reviewpatterns. [R10], [R11], [R13]
E. Define “done” and “approved”
- Verified + interpretation: Codex workflows and ExecPlans emphasize explicit context and clear definition of done. [R13]
- Recommendation: Write approval gates before implementation begins.
Open Questions / What Still Needs Verification
-
Does GSD currently provide an officially documented built-in Codex review loop for plans?
- Status: Not verified from the sources reviewed.
- Current evidence: GSD supports Codex runtime and has a Codex-oriented autonomous skill, but no official built-in Claude↔Codex council workflow was confirmed. [R7], [R8]
-
Does current Codex CLI expose a dedicated built-in “plan review” mode distinct from file-based
codex execor code/review?- Status: Not verified.
- Current evidence: Code review is well documented; plan review appears to be best implemented via
codex exec/ file workflows. [R10], [R11], [R13]
-
What is the cleanest Claude hook event for “plan completed, now trigger Codex review”?
- Status: Not fully verified in this document.
- Current evidence: Hooks are available and deterministic, but the exact best event choice needs implementation-level review. [R4]
-
Should review artifacts be committed to git or kept local / ignored?
- Status: Not resolved.
- Tradeoff: Committing improves auditability; ignoring keeps the repo cleaner.
-
Should the canonical plan live in GSD’s native planning structure, OpenAI-style
PLANS.md, or both?- Status: Not resolved.
- Recommendation: Pick one canonical format to avoid drift.
-
How much autonomy is actually safe for your environment?
- Status: Depends on repo sensitivity, secrets, infra access, and human tolerance for risk.
- Needs verification: organization-specific security posture.
-
Is GSD-2 materially better for this exact use case right now?
- Status: Not fully verified in this document.
- Current evidence: We verified that
gsd-2exists in the org and is described as enabling long autonomous work, but deeper feature-level validation was not completed here. See [R18].
Suggested Next Steps
Immediate next steps
- Decide whether you want:
- native-first stack: Claude Code + hooks + Codex review
- framework-first stack: GSD + Codex review
- Create the following files in a test repo:
AGENTS.mdPLANS.mdcode_review.mddocs/ai-workflow.md
- Define the review rubric categories.
- Build a first manual loop:
- Claude writes plan
- Codex reviews plan
- Claude reconciles
- Claude implements
- Codex reviews diff
- Only then decide whether to automate with hooks or CI.
After the manual loop works
- Add a hook or wrapper script to trigger Codex review automatically.
- Add CI gating with the Codex GitHub Action if useful.
- Decide whether to keep GSD for planning structure or simplify to native Plan Mode + house files.
Handoff Notes for Another AI
You are inheriting a workflow-design problem, not just a product-comparison problem.
Core objective
Help the user build a high-signal planning and review system for large implementation work, likely involving:
- Claude Code
- potentially GSD
- Codex as a reviewer
- durable artifacts
- maybe multi-agent or partially autonomous execution
What the user seems to want
- high rigor
- less hand-holding
- less fuzzy chat-only work
- stronger planning than a single model alone
- a practical way to use cross-model review
- likely something that can scale to large PRDs and multiple implementation phases
Strongest currently supported recommendation
Start with:
- Claude Code Plan Mode or GSD to author the plan
- Codex
codex execto review the plan file - Codex
/reviewto review implementation changes - files as the source of truth
- optional Claude hooks or CI for automation
Do not assume
- that GSD has a built-in mature Claude↔Codex council loop
- that there is an official first-party debate feature
- that two-model agreement equals correctness
- that giant one-shot execution is the right objective
If continuing this work
A good next deliverable would be one of:
- a concrete folder structure + sample files (
AGENTS.md,PLANS.md,code_review.md) - a
hooksconfig example for Claude calling Codex - a bash/PowerShell wrapper that runs the plan-review loop
- a GitHub Actions workflow that gates PRs on plan-review artifacts
- a side-by-side recommendation on native Plan Mode + house process vs GSD + Codex review
Reviewer Notes and Improvements Made
Review method used
- Verified: No external reviewer-agent tool was available in this environment for direct document critique.
- Performed: A serious self-review pass was done.
Improvements made beyond the original discussion
- Separated evidence classes into:
- Verified
- User-stated
- Assistant-stated but unverified
- Tentative / speculative
- Replaced fuzzy claims with source-backed distinctions where possible.
- Added missing implementation structure, especially:
- file layout
- review rubric
- reconciliation step
- CI gating pattern
- Added risk analysis around:
- permissions
- secrets
- over-automation
- repo clutter
- false confidence
- Added open questions instead of silently overstating uncertain points.
- Normalized the recommendation away from “let two models argue forever” and toward a stricter adversarial-review pipeline.
Self-review findings
- The biggest remaining uncertainty is exactly how much of this GSD/Codex interplay is officially documented versus emergent community practice.
- The document therefore deliberately avoids claiming that GSD already has a polished built-in model-council workflow.
Appendix A — Structured Summary (YAML-style)
title: "Claude Code, GSD, Plan Mode, and Codex Review Council"
date: "2026-03-19"
user_goal:
- compare Claude Code Plan Mode vs GSD
- understand how to use Codex to review Claude/GSD plans
- design a high-rigor workflow for large PRDs and advanced implementation work
main_conclusions:
- status: Verified
point: "Claude Code Plan Mode is a native read-only planning workflow."
refs: [R1]
- status: Verified
point: "GSD is a third-party workflow/framework supporting Claude Code and Codex runtimes."
refs: [R7, R8]
- status: Verified
point: "Claude Code provides subagents, agent teams, hooks, permissions, and sandboxing."
refs: [R2, R3, R4, R5, R6]
- status: Verified
point: "Codex provides codex exec, review workflows, AGENTS.md, GitHub Action, and MCP/Agents SDK integration."
refs: [R10, R11, R12, R14, R15]
- status: Verified-with-caution
point: "No official built-in Claude↔Codex debate feature was found in the reviewed docs."
refs: [R1, R2, R3, R4, R5, R6, R9, R10, R11, R12, R13, R14, R15]
recommended_workflow:
canonical_planner: "Claude Code or GSD on Claude Code"
adversarial_reviewer: "Codex"
durable_artifacts:
- AGENTS.md
- PLANS.md
- code_review.md
- reviews/*
steps:
- "Claude explores and authors canonical plan"
- "Codex reviews plan against rubric"
- "Claude reconciles findings explicitly"
- "Claude implements"
- "Codex reviews diff/branch"
- "Human approves"
risks:
- "dangerously-skip-permissions increases risk"
- "secret leakage if tools can read sensitive files"
- "tooling drift because GSD changes fast"
- "debate theater without strict rubric"
- "repo clutter from plan/review artifacts"
open_questions:
- "Does GSD officially document a built-in plan review loop using Codex?"
- "Best Claude hook event for triggering Codex plan review?"
- "Should plan/review files be committed or ignored?"
- "Should the canonical plan format be GSD-native, PLANS.md, or both?"Appendix B — Additional Reference
- [R18] GSD organization page showing
gsd-2repository exists and is described as enabling long autonomous work:
https://github.com/gsd-build