The Agent Loop and Supervision Contracts in AI Coding Tools

June 23, 2026 · AI Agent Internals: How Coding Agents Really Work (part 3)

▶ Watch on YouTube & subscribe to The Stack Underflow

Every AI coding agent on the market — Cursor, Windsurf, Claude Code, Devin — looks different on the surface. Different UIs, different branding, different pitch decks. But underneath, they all run the same fundamental mechanism: an agent loop. What actually separates them is not the model or the tooling. It is where you, the human, sit inside that loop.

This episode of “How Claude Actually Works” builds on the previous two episodes (the model-plus-orchestrator architecture and MCP) to answer a deceptively simple question: what does an agent actually do over time?

The one-sentence version: Every coding agent runs the same think-act-read-decide loop; the only real difference between tools is how often that loop pauses to ask for your approval.

The Agent Loop: Think, Act, Read, Decide

A single tool call almost never finishes the job. Ask an agent to fix a bug and it will read files, run tests, see failures, edit code, and retry — many discrete actions, one coherent task. The mechanism that makes this work is the agent loop.

Here is the cycle in plain terms:

1. THINK  — the model decides what the next step is
2. ACT    — it emits a tool call (read file, run command, write code...)
3. READ   — the orchestrator executes the tool; the result comes back as text
4. DECIDE — the model reads that result and asks: "Is the task done?"
            → YES: exit the loop
            → NO:  go back to step 1

The loop repeats until the model itself decides the work is finished. There is no separate scheduler, no hard-coded sequence of five steps, and no external planner storing a to-do list. The model is the program that runs each iteration.

This connects directly to episodes one and two of the series: the model produces text (including tool calls), the orchestrator executes those calls, and MCP is the protocol the orchestrator uses to talk to real tools. The loop is just those components cycling in sequence.

What Most People Get Wrong

The most common misconception is that the agent follows a pre-written script or a stored plan. It does not. At each iteration, the model freshly evaluates the full conversation context — everything that has happened so far — and decides what to do next. There is no persistent plan object being ticked off. The history of tool results in the context window is the plan, updated in real time.

This is why agent behavior can look surprisingly adaptive: it literally is. If a test fails in an unexpected way, the model reads that failure and adjusts the next action accordingly, without any special “error handling” code in the loop itself.

Where the Human Sits: Three Supervision Contracts

The loop is identical across every major coding agent. What differs is a single design decision: how often does the loop pause and wait for human input?

You can think of this as a dial with three named positions:

Supervision level	How it works	Examples
Per-action review	Agent proposes a change, stops, waits for you to approve the diff before proceeding	Cursor, Windsurf
Plan-then-run	Agent writes a plain-English plan, you approve it, then it runs many steps without interrupting	Claude Code (plan mode)
Fully autonomous	Agent runs the entire loop alone — sometimes for hours — and you review the result at the end	Devin, background agents

None of these is inherently better. They reflect different trust levels and different task profiles:

High-stakes, unfamiliar codebase — per-action review keeps you in control at the cost of constant interruptions.
Well-scoped task, codebase you know — plan-then-run lets you sanity-check the strategy without babysitting every step.
Long-running, well-defined, low-risk work — fully autonomous frees you to context-switch while the agent grinds.

The underlying model, orchestrator, and MCP setup can be identical in all three cases. The only thing that changed is the contract.

An ASCII View of the Loop and Human Touchpoints

         ┌─────────────────────────────────────────────────┐
         │                   AGENT LOOP                   │
         │                                                 │
  ┌──────▼──────┐     ┌────────────┐     ┌─────────────┐  │
  │   THINK     │────▶│    ACT     │────▶│    READ     │  │
  │ (model)     │     │(tool call) │     │  (result)   │  │
  └─────────────┘     └────────────┘     └──────┬──────┘  │
         ▲                                       │         │
         │            ┌────────────┐             │         │
         └────────────│   DECIDE   │◀────────────┘         │
                      │ done? y/n  │                       │
                      └─────┬──────┘                       │
                            │ NO → loop again              │
                            │ YES → EXIT                   │
         └─────────────────────────────────────────────────┘

Human touchpoints (choose one):
  [Per-action] ──── after every ACT
  [Plan-then-run] ─ once, before the first ACT
  [Autonomous] ──── only after EXIT

Common Misconceptions

“Different agents use fundamentally different AI architectures.” Not necessarily. The loop — think, act, read, decide — is the same. What changes is the supervision contract, not the underlying model or orchestrator design.
“The agent has a stored plan it works through step by step.” There is no separate plan object. The model re-evaluates the full conversation context at every iteration. The growing list of tool results in the context window is the closest thing to a plan.
“Fully autonomous means the agent is smarter.” Autonomy is a trust setting, not a capability level. A fully autonomous agent with a bad initial prompt will confidently do the wrong thing for a long time without you noticing.
“Per-action review is for beginners.” It is a deliberate supervision choice, appropriate any time the stakes are high or the codebase is unfamiliar. Senior engineers use it all the time on critical paths.

Frequently Asked Questions

What actually stops the loop from running forever? The model itself decides to stop — it emits a final response instead of another tool call. Most agent frameworks also include safety limits (max iterations, timeout) as a backstop, but the primary exit condition is the model concluding the task is done.

If the loop is the same everywhere, why do agents feel so different to use? UX, supervision contract, and default tool access. Cursor’s per-action diff approval feels interactive and safe. Devin’s background run feels hands-off. Both are the same loop with different pause points and different tool sets wired up via MCP.

Can I change the supervision level in mid-task? In some tools, yes. Claude Code lets you switch between interactive and plan modes. In others the contract is baked into the product. Check your tool’s docs — but know that what you are really asking is “can I move the human pause point during a run?”

Does the model remember what it did in previous runs of the loop? Only through the context window. Each iteration appends the tool result to the conversation history the model reads. Once a session ends and the context is cleared, the model starts fresh. Persistent memory across sessions is a separate layer, not part of the base loop.

Where This Fits in the Series

This is episode three of “How Claude Actually Works” — a course that starts at the model layer and builds up to full agentic systems. Episode one covered the model-plus-orchestrator split. Episode two explained MCP as the protocol connecting the orchestrator to real tools. This episode completes the picture by showing how those components cycle in a loop, and how the loop’s supervision contract is what you are actually choosing when you pick a coding agent.

Episode four goes deeper: from a high-level instruction like “create a GitHub issue” all the way down to the raw API call that makes it happen.

Browse all tutorials to see the full course.

Found this useful? The deep version lives on YouTube — new breakdowns of how AI dev tools actually work, weekly.

Subscribe on YouTube →