Claude Code Looping, Planning, and Orchestration Playbook / Handoff Artifact

Artifact date: 2026-03-17
Scope: Research-grade extraction and improvement of the discussion about how to get Claude Code to do more useful looping, especially for planning, and whether wrappers such as OpenClaw or tools such as Task Master materially help.
Prepared for: Another AI or human operator continuing the work without needing the original conversation.


Executive Summary

This discussion was fundamentally not about “make Claude Code loop forever.” It was about how to get Claude Code to do more useful planning passes and avoid stopping too early during the planning stage.

Highest-confidence conclusions

  1. [Verified] The strongest official path is to use Claude Code’s native features first:
    Plan Mode, subagents, hooks, skills, CLAUDE.md, sandboxing, and aggressive context management are all first-party mechanisms specifically designed to improve multi-step planning and execution workflows.
    Why this matters: These are the least brittle, most policy-safe, and most directly supported ways to make Claude Code behave more like a deliberate staged agent rather than a single-turn chatbot.
    Evidence: [R1], [R2], [R3], [R4], [R5], [R6], [R7], [R8], [R9], [R10], [R11], [R22]

  2. [Verified] /loop is real, but it is not durable orchestration.
    Claude Code can re-run prompts on a schedule with /loop, but those scheduled tasks are session-scoped, disappear when the Claude Code process exits, and recurring tasks expire after 3 days.
    Why this matters: /loop is useful for polling or re-running packaged workflows inside a live session, but it is not the right primitive for long-lived autonomous planning orchestration.
    Evidence: [R4]

  3. [Verified] Hooks are one of the biggest levers for “keep going until planning is actually done.”
    Claude Code supports Stop hooks that can block stopping and feed back what remains to be done. It also supports agent hooks for stronger verification.
    Why this matters: This directly addresses the user’s complaint that Claude Code “stops” before enough planning cycles have happened.
    Evidence: [R3]

  4. [Verified] Anthropic explicitly forbids using OAuth tokens from Free/Pro/Max in other products, tools, or services.
    Anthropic’s current Claude Code legal/compliance documentation says OAuth authentication for Free/Pro/Max is intended exclusively for Claude Code and Claude.ai, and using those tokens in any other product, tool, or service is not permitted.
    Why this matters: This is the most important policy constraint in the entire discussion. It materially weakens “subscription-auth wrappers” as a recommended path.
    Evidence: [R7]

  5. [Verified] OpenClaw can technically work with Anthropic setup-token / subscription auth, but even OpenClaw’s own docs say Anthropic API keys are the safer recommended production path.
    OpenClaw documentation explicitly recommends API key auth over Anthropic subscription setup-token auth for long-lived gateways, and warns that Anthropic subscription use outside Claude Code has been restricted for some users before.
    Why this matters: OpenClaw is not fake vaporware, but it is also not the cleanest answer to “how do I make Claude Code plan better on my subscription.”
    Evidence: [R14], [R15], [R16]

  6. [Verified + flagged conflict] Task Master is genuinely relevant for task decomposition, but the documentation is mixed.

    • Task Master’s installation docs say “Now that you have Node.js and your first API Key…”
    • A GitHub example for Task Master shows a Claude Code provider with no API key required, plus settings like maxTurns.
    • Another Task Master configuration page also emphasizes OAuth for ChatGPT and optional API key usage in that context.
      Why this matters: Task Master may be useful as a structured planning/task layer, but its auth story and “recommended” path are not cleanly aligned across all docs. It must be treated carefully.
      Evidence: [R12], [R13], [R17]
  7. [Assistant-stated but strongly supported] The best operational pattern is staged orchestration, not brute-force infinite looping.
    The most robust pattern is:

    • explore first,
    • plan deeply,
    • write the plan to a file,
    • break it into PR-sized tasks,
    • add stop gates,
    • execute the next task in a fresh or narrowed pass.
      Why this matters: This is the synthesis most consistent with official docs on Plan Mode, hooks, CLAUDE.md, skills, context management, and programmatic/headless usage.
      Evidence basis: Inference from [R1], [R2], [R3], [R5], [R8], [R9], [R10], [R11]

Purpose of This Document

This document is intended to function as all of the following:

  • a guide for the user,
  • a playbook for implementation,
  • a briefing memo for decision-making,
  • and a handoff document for another AI continuing the work.

It does not merely summarize the conversation. It extracts the important claims, verifies what can be verified, labels the evidence status of each major point, highlights conflicts and policy risks, adds missing considerations, and turns the discussion into an actionable operating model.


Discussion Context

User goals and constraints

  1. [User-stated] The user wants to understand the best ways to push Claude Code to do more looping, especially more planning steps.
  2. [User-stated] The user’s main pain is that Claude Code appears to stop after enough cycles, even though the planning stage still needs several more passes.
  3. [User-stated] The user is curious whether OpenClaw or similar tools sitting “on top of Claude Code” using a subscription model are actually helpful.
  4. [User-stated] The user is primarily interested in practical leverage, not abstract theory.
  5. [User-stated] The user cares about current, up-to-date information and does not want a stale answer.

Framing correction

  1. [Assistant-stated, now strengthened] The real problem is not “more looping” in the abstract. The real problem is:
    • preserving context,
    • forcing a real planning workflow,
    • avoiding early exits,
    • reducing permission friction,
    • and structuring work so Claude Code can complete it in bounded stages.

Key Facts and Verified Findings

1) Native Claude Code features are the main official answer

1.1 Plan Mode

  • Status: Verified
  • Claim: Claude Code’s Plan Mode is meant for exploring a codebase, planning complex changes, and iterating on direction before editing. It can be started interactively or with claude --permission-mode plan, including -p / headless use.
  • Evidence: [R1]
  • Why it matters: This directly matches the user’s need for “more planning steps.”

1.2 Subagents

  • Status: Verified
  • Claim: Claude Code subagents run in their own context windows, can have custom tool access and prompts, and are useful for specialized tasks and context isolation. Claude Code includes built-in subagents such as Explore, Plan, and general-purpose.
  • Evidence: [R2]
  • Why it matters: Offloading exploration/research/review to subagents prevents the main conversation from bloating and helps planning stay coherent longer.

1.3 Hooks

  • Status: Verified
  • Claim: Claude Code hooks can run at key lifecycle events and can be used to:
    • re-inject context after compaction,
    • block or allow actions,
    • enforce checks before stopping,
    • and automate workflow rules.
  • Evidence: [R3]
  • Why it matters: Hooks are one of the cleanest ways to force more planning rigor and stop Claude Code from exiting prematurely.

1.4 Skills

  • Status: Verified
  • Claim: Claude Code skills are reusable prompt/playbook packages in SKILL.md. Claude can invoke them automatically when relevant or via slash commands, and bundled skills can orchestrate non-trivial work.
  • Evidence: [R5]
  • Why it matters: This is the official mechanism for packaging repeatable planning workflows such as /spec-feature, /break-into-prs, /review-plan.

1.5 CLAUDE.md and memory

  • Status: Verified
  • Claim: CLAUDE.md provides persistent instructions loaded at the start of every session. Claude Code also has auto memory. CLAUDE.md is the official place for stable project rules and workflow instructions.
  • Evidence: [R9]
  • Why it matters: Planning instructions should not live only in conversation history; they should be anchored in CLAUDE.md.

1.6 Programmatic usage / Agent SDK

  • Status: Verified
  • Claim: Claude Code has an official programmatic path via the Agent SDK and claude -p, previously called headless mode.
  • Evidence: [R6]
  • Why it matters: If the user wants deterministic multi-pass orchestration, this is the official path rather than trying to “smuggle” a consumer login through third-party tooling.

2) /loop is useful but limited

2.1 Session-scoped only

  • Status: Verified
  • Claim: Scheduled tasks created through /loop are session-scoped and disappear when the current Claude Code process exits.
  • Evidence: [R4]
  • Why it matters: /loop is not durable automation.

2.2 Three-day expiry

  • Status: Verified
  • Claim: Recurring scheduled tasks in Claude Code automatically expire 3 days after creation.
  • Evidence: [R4]
  • Why it matters: Even if a loop works, it is bounded and temporary.

2.3 Good use case

  • Status: Verified
  • Claim: /loop is appropriate for polling, re-running packaged commands, or doing time-based checks inside an active session.
  • Evidence: [R4]
  • Why it matters: /loop is best treated as a convenience layer, not the core solution for deep planning.

2.4 Wrong use case

  • Status: Assistant-stated but strongly supported
  • Claim: /loop should not be the foundation for long-lived architecture-planning orchestration.
  • Evidence basis: limitations in [R4]
  • Why it matters: Using /loop as the main answer to “do more planning cycles” will create brittle behavior.

3) Hooks are a major leverage point for deeper planning

3.1 Stop hook can keep Claude working

  • Status: Verified
  • Claim: A Stop hook can ask whether all tasks are complete; if not, Claude keeps working and receives the reason as the next instruction.
  • Evidence: [R3]
  • Why it matters: This directly answers the user’s complaint that Claude Code stops too soon.

3.2 Agent-based stop verification

  • Status: Verified
  • Claim: Agent-based hooks can inspect files or run commands to verify conditions, for example verifying tests pass before allowing Claude to stop.
  • Evidence: [R3]
  • Why it matters: This is stronger than a simple prompt gate.

3.3 Re-inject context after compaction

  • Status: Verified
  • Claim: A SessionStart hook with matcher compact can re-inject critical context after compaction.
  • Evidence: [R3]
  • Why it matters: This is a missing but important consideration that materially improves long planning sessions.

3.4 Infinite stop-hook loops are a real risk

  • Status: Verified
  • Claim: Claude Code docs explicitly document the need to guard against Stop hooks that keep firing forever. The hook should inspect stop_hook_active and exit early when appropriate.
  • Evidence: [R3]
  • Why it matters: The cure for early stopping can itself create a failure mode if configured badly.

4) Context management is a first-class problem

4.1 Context degradation is real

  • Status: Verified
  • Claim: Claude Code docs say context fills quickly, performance degrades as it fills, and Claude may forget earlier instructions as context becomes saturated.
  • Evidence: [R10], [R11]
  • Why it matters: A giant wandering session is usually worse than staged work.

4.2 What to persist

  • Status: Verified
  • Claim: Persistent rules belong in CLAUDE.md, not only in conversation history.
  • Evidence: [R9], [R10]
  • Why it matters: This is the right place for planning rules, acceptance criteria, and stop conditions.

4.3 Helpful commands and patterns

  • Status: Verified
  • Claim: Claude Code supports /context, /clear, /resume, /rename, and custom compaction guidance; these are part of official context hygiene.
  • Evidence: [R8], [R10]
  • Why it matters: Better context hygiene is often more important than “more looping.”

4.4 Skills and subagents reduce context pressure

  • Status: Verified
  • Claim: Skills load on demand and subagents use separate fresh context, reducing bloat in the main conversation.
  • Evidence: [R5], [R10]
  • Why it matters: The right architecture is “fan out work,” not “pile more into the same thread.”

4.5 Parallel sessions are official and often better than one giant loop

  • Status: Verified
  • Claim: Claude Code supports parallel work through separate sessions and documents git worktrees as the clean way to run parallel Claude sessions. It also warns that resuming the same session in multiple terminals will interleave messages and make the conversation messy; for a clean branch-off, use a forked session.
  • Evidence: [R10]
  • Why it matters: For some workflows, the right answer is not “more looping.” It is “split the work cleanly.”

5) Sandboxing is a missing but important autonomy lever

5.1 Reduced approval friction

  • Status: Verified
  • Claim: Claude Code sandboxing reduces the need for constant bash permission prompts by defining allowed boundaries up front.
  • Evidence: [R11]
  • Why it matters: Some “Claude stopped” complaints are really “Claude kept waiting for permission and momentum died.”

5.2 Security and autonomy balance

  • Status: Verified
  • Claim: Sandboxing is designed to increase autonomy while maintaining OS-level filesystem and network restrictions.
  • Evidence: [R11]
  • Why it matters: This is relevant when the user wants Claude Code to “keep doing its thing” with less babysitting.

5.3 Caveat

  • Status: Verified
  • Claim: Sandboxing only applies to Bash and child processes; it does not replace broader permission rules.
  • Evidence: [R11]
  • Why it matters: Sandboxing helps, but it is not a universal autonomy switch.

6) Agent teams are real, but not the default answer

6.1 Experimental status

  • Status: Verified
  • Claim: Claude Code agent teams are experimental, disabled by default, and have known limitations.
  • Evidence: [R15]
  • Why it matters: They should not be the first recommendation for a user who mainly needs better planning loops.

6.2 Communication advantage

  • Status: Verified
  • Claim: Agent teams are distinct from subagents because teammates can communicate directly and coordinate through a shared task list.
  • Evidence: [R15]
  • Why it matters: Use them only when agents genuinely need to collaborate, debate, or self-coordinate.

6.3 Token cost

  • Status: Verified
  • Claim: Agent teams use significantly more tokens than a single session.
  • Evidence: [R15], [R8]
  • Why it matters: They are powerful but expensive and potentially overkill for planning-only use cases.

6.4 Better default

  • Status: Assistant-stated but strongly supported
  • Claim: For most planning workflows, subagents are the better first step; move to agent teams only when independent collaborators need to talk to each other.
  • Evidence basis: [R2], [R15]
  • Why it matters: This is the cleaner and cheaper default.

7) Usage limits may be part of the “it stops” problem

7.1 Shared limits

  • Status: Verified
  • Claim: Claude Pro and Max plan usage limits are shared across Claude and Claude Code.
  • Evidence: [R18]
  • Why it matters: Hitting usage limits in chat or coding work can cause terminal work to stop unexpectedly.

7.2 Extra usage exists

  • Status: Verified
  • Claim: Paid plans can enable extra usage after hitting included limits, billed at standard API rates.
  • Evidence: [R19]
  • Why it matters: If the user’s real bottleneck is capacity rather than orchestration, this changes the solution path.

7.3 Incomplete diagnosis

  • Status: Tentative / speculative
  • Claim: The user’s observed stop behavior may be caused by usage limits, permission prompts, context saturation, task size, or early-stop behavior.
  • Why it matters: The exact failure mode still needs diagnosis.

8) Official policy risk around subscription-based wrappers is real

8.1 Anthropic restriction

  • Status: Verified
  • Claim: Anthropic says Free/Pro/Max OAuth auth is intended only for Claude Code and Claude.ai, and using that auth in other tools/services is not permitted.
  • Evidence: [R7]
  • Why it matters: This makes “subscription-based wrapper” advice materially riskier than it might have been in the past.

8.2 OpenClaw’s own caution

  • Status: Verified
  • Claim: OpenClaw’s OAuth docs say Anthropic subscription use outside Claude Code has been restricted for some users before and recommend API key auth for production.
  • Evidence: [R14], [R16]
  • Why it matters: Even third-party tooling acknowledges the risk.

8.3 Practical conclusion

  • Status: Assistant-stated but strongly supported
  • Claim: Third-party tools that try to route consumer Claude subscription auth outside native Claude Code should be treated as policy-risk and operational-risk, even if they technically work at a given moment.
  • Evidence basis: [R7], [R14], [R16]
  • Why it matters: This should heavily influence recommendations.

9) Task Master is promising, but should be used carefully

9.1 Why it is relevant

  • Status: Verified
  • Claim: Task Master provides structured task workflows such as parsing a PRD, analyzing complexity, showing next task, and updating task status.
  • Evidence: [R13]
  • Why it matters: This directly maps to the user’s desire for more structured planning loops.

9.2 Claude Code provider example

  • Status: Verified
  • Claim: A Task Master GitHub example shows a claude-code provider that claims no API key is required and supports settings such as maxTurns.
  • Evidence: [R13]
  • Why it matters: This explains why users are attracted to it for orchestration.

9.3 Documentation conflict

  • Status: Verified + conflict flagged
  • Claim: Task Master’s installation docs say “Now that you have Node.js and your first API Key…”, while other docs/examples show alternative auth/provider paths.
  • Evidence: [R12], [R13], [R17]
  • Why it matters: Another AI should not assume the auth story is simple or settled.

9.4 Recommendation status

  • Status: Assistant-stated but strongly supported
  • Claim: Task Master is best treated as a task decomposition layer, not as a blanket “subscription loophole.”
  • Evidence basis: [R12], [R13], [R17], [R7]
  • Why it matters: The value is in the workflow, not the auth hack.

10) OpenClaw is broader than the user’s immediate problem

10.1 What it is good at

  • Status: Verified
  • Claim: OpenClaw is positioned as a long-lived gateway / agent environment with multiple auth paths, optional channels, and broader orchestration possibilities.
  • Evidence: [R14], [R20], [R21]
  • Why it matters: OpenClaw may be useful for connectors, gateways, or autonomous multi-system workflows.

10.2 Anthropic auth recommendation

  • Status: Verified
  • Claim: For Anthropic in production, OpenClaw recommends API key auth over subscription setup-token auth.
  • Evidence: [R14], [R16]
  • Why it matters: This weakens the case for using OpenClaw mainly as a subscription-auth layer for Claude Code.

10.3 Practical recommendation

  • Status: Assistant-stated but strongly supported
  • Claim: OpenClaw is not the best first answer to “how do I get Claude Code to do deeper planning loops?”
  • Evidence basis: [R1], [R3], [R4], [R6], [R7], [R14], [R16]
  • Why it matters: Native Claude Code features solve the core planning problem more directly.

Major Decisions and Conclusions

  1. [Verified + conclusion] Prefer native Claude Code workflow controls first.
    Use Plan Mode, subagents, hooks, CLAUDE.md, skills, sandboxing, and context management before reaching for wrappers.
    Evidence: [R1], [R2], [R3], [R5], [R9], [R10], [R11]

  2. [Assistant-stated but strongly supported] The default operating model should be staged, not monolithic.
    Explore → Plan → Write plan file → Break into PR-sized tasks → Stop-gate → Execute next task.
    Evidence basis: [R1], [R2], [R3], [R6], [R9], [R10]

  3. [Verified + conclusion] Use /loop only for polling or repeated checks within a live session.
    Do not treat it as durable orchestration.
    Evidence: [R4]

  4. [Verified + conclusion] Use subagents before agent teams.
    Agent teams are experimental and expensive.
    Evidence: [R2], [R15]

  5. [Verified + conclusion] Treat third-party subscription-auth wrappers as policy-risk for Anthropic.
    Evidence: [R7], [R14], [R16]

  6. [Assistant-stated but strongly supported] OpenClaw is a broader automation/gateway tool, not the cleanest planning-loop fix.
    Evidence basis: [R14], [R16], [R20], [R21]

  7. [Assistant-stated but strongly supported] Task Master is the most relevant external planning layer discussed, but should be used with compliant auth and careful expectations.
    Evidence basis: [R12], [R13], [R17], [R7]


Reasoning, Tradeoffs, and Why It Matters

OptionEvidence statusBest useStrengthsWeaknesses / risksRecommendation
Native Claude Code: Plan Mode + hooks + subagents + CLAUDE.mdVerifiedBetter planning inside Claude Code itselfFirst-party, direct fit, policy-safe, designed for thisRequires setup disciplinePrimary recommendation
/loopVerifiedPolling / repeated checks in active sessionSimple and built-inSession-scoped, expires, not durableUse narrowly
Agent SDK / claude -pVerifiedDeterministic orchestration and scriptingOfficial programmatic pathMore setup workStrong secondary path
SubagentsVerifiedIsolated research / review / task-specific workSeparate context, flexible, cheaper than teamsLess inter-agent collaborationUse before teams
Agent teamsVerifiedParallel work where agents must communicatePowerful for debate/review/cross-layer workExperimental, expensive, more moving partsUse selectively
SandboxingVerifiedReduce permission friction for bash autonomyFewer prompts, stronger boundariesNeeds correct config; only applies to bashImportant missing lever
Task MasterVerified + conflicting docsTask decomposition, next-task workflowStrong fit for planning structureDoc/auth ambiguity; policy caution if misusedPotentially useful, careful use
OpenClawVerifiedLong-lived gateway / broader automationFlexible, multi-model, broader orchestrationAnthropic subscription path is risky; not the cleanest fixNot first choice for this problem
One giant long sessionAssistant-stated but supportedAlmost neverFeels simpleContext degradation, drift, early forgettingAvoid
Multiple independent sessions / worktreesVerifiedParallel branches or separate task streamsCleaner than forcing everything into one threadRequires process disciplineOften better than “more looping”

Phase 0 — Baseline setup

  1. [Verified] Update Claude Code and check version.
    Why: important features and fixes are version-gated.

    • Scheduled tasks require v2.1.72+. [R4]
    • Agent teams require v2.1.32+. [R15]
    • Auto memory requires v2.1.59+. [R9]
  2. [Verified] Pay attention to recent changelog fixes.
    Particularly relevant:

    • 2.1.78 fixed plan mode being lost after compaction.
    • 2.1.78 fixed an infinite loop when API errors triggered stop hooks.
      Why: If the user experienced weird stop/plan-mode behavior on older versions, the latest release may have already improved it.
      Evidence: [R21]
  3. [Tentative / needs verification] Diagnose the actual stop condition.
    Check whether the current pain is:

    • usage limit,
    • permission friction,
    • context saturation,
    • giant task scope,
    • or stop-hook / scheduling behavior.

Phase 1 — Make planning persistent, not conversationally fragile

  1. [Verified] Create or tighten CLAUDE.md.
    Put stable project and workflow rules there:

    • architecture constraints,
    • build/test commands,
    • “explore first, then plan, then code,”
    • “write/update the task file,”
    • “do not stop until acceptance criteria are addressed or deferred.”
      Evidence: [R9], [R10]
  2. [Verified] Keep CLAUDE.md concise and specific.
    Claude docs advise concise, well-structured instructions and suggest targeting under ~200 lines per file.
    Evidence: [R9]

  3. [Verified] Add compact instructions.
    Use a “Compact Instructions” section or /compact guidance so compaction preserves what matters.
    Evidence: [R10]


Phase 2 — Run planning as a deliberate workflow

Default prompt pattern

Status: Assistant-stated but strongly supported

Use a plan-first prompt such as:

Enter plan mode.
 
Study the codebase first.
Write a detailed plan to docs/current-plan.md that includes:
- assumptions
- architecture options
- edge cases
- migration risks
- test strategy
- rollout/rollback notes
- PR-sized tasks with acceptance criteria
Do not implement yet.

Why this is recommended: It aligns with official Plan Mode, context, and persistence guidance.
Evidence basis: [R1], [R9], [R10], [R22]

After plan creation

Use a second-pass prompt such as:

Read docs/current-plan.md.
Select only the next task.
Implement only that task.
Run the relevant tests.
Update docs/current-plan.md with:
- what changed
- what remains
- blockers
- next step
Do not begin the next task unless the current one is complete.

Why this is recommended: It forces bounded execution rather than drifting across the entire project.


Phase 3 — Use subagents to preserve the main session

  1. [Verified] Use subagents for exploration, review, and edge-case checks.
    Example roles:

    • architecture reviewer,
    • test reviewer,
    • security reviewer,
    • codebase explorer.
      Evidence: [R2]
  2. [Assistant-stated but strongly supported] Good planning pattern:

    • main session = synthesis and decision-making
    • subagents = research and critique
      This keeps planning sharper and cheaper than stuffing everything into one thread.

Phase 4 — Add stop gates so Claude does not stop early

  1. [Verified] Add a prompt-based or agent-based Stop hook.
    Prompt-based example logic:

    • “Check if all requested tasks are complete. If not, return what remains.”
      Evidence: [R3]
  2. [Verified] Use agent-based stop hooks for real validation.
    Example:

    • verify tests actually pass,
    • check docs/task file were updated,
    • inspect repository state.
      Evidence: [R3]
  3. [Verified] Guard against infinite loop behavior.
    Respect stop_hook_active or equivalent logic.
    Evidence: [R3]

Practical stop criteria to enforce

Status: Assistant-stated but strongly supported

Before Claude stops, require all of the following:

  • task file updated,
  • acceptance criteria checked,
  • tests run or explicitly deferred with reason,
  • open questions logged,
  • next action written down.

Phase 5 — Reduce autonomy friction

  1. [Verified] Enable sandboxing where appropriate.
    This can reduce bash approval friction materially.
    Evidence: [R11]

  2. [Verified] Use sandbox auto-allow within safe boundaries.
    This helps Claude keep moving when its work is mostly inside approved directories/domains.
    Evidence: [R11]

  3. [Assistant-stated but strongly supported] This is especially important if “it stops” often really means “it keeps waiting for permission.”


Phase 6 — Choose the right orchestration layer

Use native Claude Code only

Best when: one repo, one operator, planning and execution inside the terminal.
Recommendation: default path.

Use Agent SDK / claude -p

Best when: you want deterministic scripted orchestration, CI/CD, or a custom controller.
Recommendation: official secondary path.
Evidence: [R6]

Use Task Master

Best when: you want structured decomposition, dependencies, “next task,” and planning state.
Recommendation: useful, but use with auth caution.
Evidence: [R12], [R13], [R17]

Use OpenClaw

Best when: you want a broader gateway / multi-channel automation environment.
Recommendation: not the cleanest first answer to deeper Claude Code planning loops; prefer API-based auth.
Evidence: [R14], [R16], [R20]

Use agent teams

Best when: multiple collaborators need to communicate directly, challenge each other, or coordinate complex multi-part work.
Recommendation: not default; use selectively.
Evidence: [R15]

Use git worktrees / forked sessions

Best when: you want parallel work without turning one session into chaos.
Evidence: [R10] Recommendation: often a better answer than “make one session loop harder.”


Official Anthropic / Claude Code sources

Official support / usage docs

Task Master sources

OpenClaw sources


Risks, Caveats, and Red Flags

  1. [Verified] Policy risk: Using Anthropic consumer-plan OAuth in external tools/services is not permitted by Anthropic.
    Evidence: [R7]

  2. [Verified] Operational risk: /loop is not durable orchestration. It dies with the session and recurring tasks expire after 3 days.
    Evidence: [R4]

  3. [Verified] Configuration risk: A badly written Stop hook can create an infinite loop or brittle behavior.
    Evidence: [R3], [R21]

  4. [Verified] Cost risk: Agent teams use significantly more tokens than a single session.
    Evidence: [R15]

  5. [Verified] Security risk: Overly permissive sandboxing or write access can create escalation/exfiltration risk.
    Evidence: [R11]

  6. [Verified + conflict flagged] Documentation inconsistency risk: Task Master has mixed auth/provider messaging across docs and examples.
    Evidence: [R12], [R13], [R17]

  7. [Assistant-stated but strongly supported] Process risk: Users often misdiagnose “Claude stopped too early” when the real issue is one of:

    • usage exhaustion,
    • context saturation,
    • insufficient persistent instructions,
    • giant task size,
    • or constant permission friction.
  8. [Verified] Session hygiene risk: Resuming the same session in multiple terminals can jumble history; use forked sessions or worktrees for cleaner parallel work.
    Evidence: [R10]


Open Questions / What Still Needs Verification

  1. [Tentative] What exact failure mode is the user seeing when Claude Code “stops”?
    Possibilities:

    • usage limits,
    • permission waits,
    • context compaction,
    • end-of-turn stop behavior,
    • or oversized task structure.
  2. [Tentative] What Claude Code version is the user currently running?
    This matters because recent versions fixed:

    • plan mode lost after compaction,
    • stop-hook-related infinite loop behavior.
      Relevant source: [R21]
  3. [Tentative] Which Claude plan and auth path is the user using right now?
    Pro / Max / Team / API / Bedrock / Vertex will affect the cleanest solution path.

  4. [Tentative] Is the user actually trying to orchestrate Claude Code from another tool, or do they mainly want Claude Code itself to behave better?
    This determines whether the next deliverable should be:

    • a CLAUDE.md,
    • hooks,
    • skills,
    • or an Agent SDK controller.
  5. [Tentative] What is the current appetite for API spend versus staying inside subscription-only constraints?
    This materially affects whether OpenClaw or Task Master are worth considering at all.

  6. [Tentative] Are there newer public clarifications from Task Master maintainers about the compliant way to combine Task Master with Claude Code after Anthropic’s current OAuth restrictions?
    This should be checked again before making any production recommendation.


Suggested Next Steps

  1. Create or revise a project CLAUDE.md with explicit planning and stop rules.
  2. Add a Stop hook that refuses to stop until:
    • the plan/task file is updated,
    • acceptance criteria are checked,
    • next action is written.
  3. Add a SessionStart compact hook to re-inject critical rules after compaction.
  4. Turn on sandboxing for safer autonomy and fewer bash prompts.
  5. Use Plan Mode by default for non-trivial work.
  6. Move research/review into subagents instead of bloating the main session.
  7. Use /loop only for in-session polling, not as a durable orchestration layer.
  8. If scripting is needed, use the official Agent SDK / claude -p path.
  9. Treat OpenClaw as optional and API-first, not as the primary fix.
  10. Treat Task Master as a possible planning layer, but verify auth/compliance assumptions before production use.

Handoff Notes for Another AI

If you are continuing this work, assume the following:

What the user actually cares about

  • The user does not primarily want theoretical “agent loops.”
  • The user wants more reliable planning depth, better continuity, and less early stopping.
  • The user is open to external tooling, but the recommendation should remain practical and policy-aware.

What should be done next

The most useful next artifact is probably not another research memo. It is one of these:

  1. a real CLAUDE.md template for this workflow,
  2. a Stop hook config,
  3. a compact re-injection hook,
  4. a small skill set such as:
    • /spec-feature
    • /break-into-prs
    • /execute-next-task
  5. or a minimal Agent SDK orchestration script.

Guardrails for future advice

  • Prefer official Claude Code features over wrappers when the user’s goal is deeper planning.
  • Do not recommend third-party consumer OAuth routing as “safe” or “approved.”
  • Clearly separate:
    • what is verified by official docs,
    • what is documented by third parties,
    • and what is inference or opinion.

If more research is needed

Prioritize:

  1. Anthropic official docs,
  2. Anthropic support docs,
  3. official docs from the third-party tool,
  4. then GitHub examples / maintainer docs,
  5. then community reports only as clearly labeled user-reported evidence.

Reviewer Notes and Improvements Made

Reviewer mode used: Serious self-review pass (no separate reviewer agent capability was available here).

Improvements made beyond the original discussion

  1. Added official-policy verification around Anthropic OAuth restrictions, which is the most important constraint for subscription-based wrappers.
  2. Added sandboxing as a missing autonomy lever; this was not sufficiently emphasized in the original answer.
  3. Added context-management and CLAUDE.md guidance as a core part of the solution rather than a side note.
  4. Added recent changelog fixes that may explain previously observed weirdness around plan mode and hooks.
  5. Flagged the Task Master documentation conflict instead of pretending the auth story is clean.
  6. Reframed the answer from “loop more” to “staged orchestration with persistent planning artifacts.”
  7. Added a clearer decision model for when to use:
    • native Claude Code,
    • Agent SDK,
    • Task Master,
    • OpenClaw,
    • subagents,
    • agent teams,
    • and worktrees.
  8. Explicitly separated Verified, User-stated, Assistant-stated but unverified, and Tentative / speculative claims throughout the document.

Remaining limitations of this artifact

  • It still does not diagnose the user’s exact stop condition because no runtime logs, screenshots, or version output were provided in this thread.
  • It does not provide a fully tested production configuration for Task Master + Claude Code under Anthropic’s current policy constraints.
  • It does not include live operational testing of OpenClaw or Claude Code on the user’s system.

Optional Appendix — Structured Summary

title: "Claude Code Looping, Planning, and Orchestration Playbook / Handoff Artifact"
date: "2026-03-17"
primary_question:
  - "How do we get Claude Code to do more useful looping, especially for planning?"
  - "Are OpenClaw or other tools sitting on top of Claude Code using the subscription model actually helpful?"
top_recommendation:
  status: "Assistant-stated but strongly supported"
  summary: "Use native Claude Code features first: Plan Mode, subagents, hooks, CLAUDE.md, sandboxing, context management, then optionally Agent SDK."
verified_key_points:
  - "Plan Mode exists and is designed for multi-step planning."
  - "Subagents use separate context windows."
  - "Hooks can block stop and re-inject context after compaction."
  - "/loop is session-scoped and recurring tasks expire after 3 days."
  - "Agent SDK / claude -p is the official programmatic path."
  - "Anthropic forbids using Free/Pro/Max OAuth tokens in other products/tools/services."
  - "OpenClaw recommends Anthropic API keys over subscription setup-token for long-lived gateways."
  - "Task Master is relevant for structured task decomposition, but its docs/auth story is mixed."
major_risks:
  - "Policy risk around third-party subscription-auth wrappers."
  - "Context saturation from giant sessions."
  - "Infinite-loop or brittle behavior from badly configured stop hooks."
  - "Token/cost explosion from agent teams."
  - "Security risk from loose sandbox configuration."
best_next_artifacts:
  - "CLAUDE.md template"
  - "Stop hook config"
  - "compact reinjection hook"
  - "planning/execution skills"
  - "minimal Agent SDK controller"
open_questions:
  - "What exact failure mode causes Claude Code to stop for this user?"
  - "What version and auth path is the user on?"
  - "How much API spend is acceptable versus staying subscription-only?"