A Loop in a Gown

Every AI agent is the same loop. The race is who dresses it best — and the best costume wins the AGI narrative.

Every agent demo, every frontier model announcement, every VC memo about the coming age of autonomous software is, underneath the marketing, about the same six-line loop. The models change. The tools change. The harness changes. The loop does not. Once you can see it clearly, a lot of the noise in the AI space becomes easier to read, not because the loop diminishes what agents can do, but because it tells you what questions are actually worth asking.

In an earlier post on tool-chain composition, I argued that the limiting factor in agent capability isn't the model. It's the quality and breadth of the tools the model can invoke, and whether the person building those tools has enough domain depth to know which ones matter. That argument is about what's inside the loop. This one is about the loop itself: what it is, what gets built on top of it, and how to tell the difference between the capability and the costume.

The Loop

Here it is, stripped down:

loop:
  observation  = observe(environment)
  thought      = reason(observation, context)
  action       = plan(thought, available_tools)
  result       = execute(action)
  context.update(result)
  if done(result): break
    

That's the ReAct pattern — Reason, Act, Observe, repeat. Every production agent you've seen demoed is running a variation of this: the one that books travel, triages support tickets, diagnoses a Kubernetes cluster, generates a pull request, runs a research report. The pattern was named by researchers but the idea is not exotic. It is just a loop.

The architectural simplicity is real and it is misleading in equal measure. The loop tells you the skeleton. It does not tell you which tools to expose, what constitutes a valid observation, how to handle a tool call that returns ambiguous or contradictory data, when to stop, what counts as done, or how to recover from an action that partially succeeds. It doesn't tell you what the model will do when its plan depends on a capability the environment doesn't support, or how the agent degrades when the context window fills with noise from failed attempts. Six lines of pseudocode carry none of that weight.

The operational complexity lives entirely outside the diagram. When you read an agent capability claim — "it can autonomously manage infrastructure," "it resolves 70% of support tickets without human intervention" — the loop is what's claimed. What's not shown is the scaffolding: the quality of the tool definitions, the harness that constrains what actions are available, the eval suite that determined what "resolved" means, the approval gates and fallback paths that catch the 30% that fails. The loop runs inside all of that. The loop alone doesn't produce the outcome.

The Harness

The harness wraps the loop. It's the scaffolding layer that turns a raw loop into something that can run in production, and it's where most of the real engineering work lives. It includes the tool server that exposes callable functions to the model; the memory layer that persists context across iterations; the planner or reflection layer that structures multi-step reasoning; the execution engine that runs tool calls against real systems; the approval gates and human-in-the-loop checkpoints; the observability hooks and audit logs that record what happened; the cost controls that throttle spend; and the UX layer that presents all of it to an operator in a way they can act on.

None of those components appear in the pseudocode. Every one of them determines how well the loop performs and what risks it carries. Most of the engineering time in a serious agent deployment goes into the harness: tuning tool definitions, hardening execution paths, building the observability that makes failure legible, designing gates that catch what they're supposed to catch. When someone says "we built an agent," they mean they built a harness around a loop. The loop is the shared substrate. The harness is the differentiator and the source of both the gowns.

The Predecessor: RPA

Before agents, there was robotic process automation. RPA ran on enterprise operations teams for years. In the right context, it worked. Log into this portal, read this field, paste it into that form, route this document if this condition holds. Narrow scope. Deterministic logic. Observable behavior. Within those constraints, RPA was genuinely reliable. Operations teams ran it on billing workflows, HR onboarding steps, data entry tasks that had been manual for years. Not glamorous. But it ran, and it didn't require babysitting.

The failure mode wasn't the technology. It was the selling. "Intelligent automation" was the pitch: broad, context-dependent, adaptive. The product could not do that. It could execute a rigid script reliably within a narrow channel. When the target application changed its UI by two pixels, the script broke. When the task domain widened, the brittleness showed. The promise was adaptability. The delivery was rigidity in a specific, well-defined trench. The enterprises that got value treated it as what it was; the ones that got burned believed the pitch.

Agents are RPA's better-dressed successor. The model genuinely reasons. It handles novel inputs, adapts to ambiguous state, assembles plans from tool combinations the developer didn't explicitly script. The adaptability is real. The improvement over RPA is meaningful. But the frame holds: where the task is narrow, the environment is observable, and the action space is controlled, an agent delivers reliable value. Where an agent is sold as autonomous intelligence across a broad, poorly-specified domain, you are buying the better-dressed version of the same overreach.

Tesla's Full Self-Driving is the one-line consumer version of this pattern. The car handles a lot. The liability stays with the driver. What the marketing promises and what the EULA requires are different documents.

The First Costume: The Pageant Queen

There are two dominant ways to dress the loop, worn in different contexts for different audiences. The first is what I think of as the pageant queen. The autonomy costume. It lives in the consumer product, the AGI narrative, and the funding pitch.

The autonomy framing is technically defensible on its own terms. The loop does run without moment-to-moment human input. "My AI agent booked my flights, ordered groceries, drafted the legal brief." Each of those is the loop executing without requiring someone to approve each action. The claim isn't false. The question is what conclusions you're licensed to draw from it. For consumers, the experience is autonomy even if the mechanism isn't, and that experience delivers real value. The gown fits.

The AGI narrative wears the same gown for a different purpose. Each improvement to agent capability — better reasoning, longer context, more reliable tool use — gets framed as progress toward general intelligence. The capability improvements are real. Whether they are progress toward the specific claim of general intelligence, or progress toward better task completion within increasingly broad but still bounded domains, is an open question. The framing closes it prematurely because the funding environment requires that.

"We improved the reasoning quality of our ReAct implementation" is technically accurate and commercially inert. Nobody raises a significant round on that sentence. "We are building toward autonomous general intelligence, and here is the evidence" funds the next phase. The gown is the capital instrument. This is not a conspiracy. It is how markets work when the most valuable outcome is also the most distant one. The framing shapes the investment thesis, the investment thesis shapes the roadmap, the roadmap shapes the product. Downstream operators and customers need to know they are looking at the loop dressed for fundraising, not a proof-of-concept for the claim.

The Second Costume: The Cockpit

The second costume is the control gown. It is worn in production deployments, enterprise software, and federal environments. Where the pageant queen projects autonomy, the cockpit projects oversight. It's the more interesting of the two costumes, because the control it projects is partly real.

A production agent deployment has a dashboard. It has approval gates. It has audit logs, governance documentation, a responsible AI framework in the rollout deck, a human-in-the-loop checkbox in the compliance review. These things exist. They are not nothing. The audit log genuinely records what happened. The approval gate genuinely requires a human action before the agent proceeds. The governance documentation genuinely constrains what the agent is allowed to do.

Whether any of that produces actual control is a separate question. The answer is: sometimes. It depends on whether the person in the loop understands the mechanism well enough to exercise the control the interface offers.

The audit log runs whether or not anyone reviews it with enough context to know what they're looking at. The approval gate fires whether or not the reviewer can evaluate the proposed action. The dashboard shows you what the agent did; it does not show you whether the reasoning behind it was sound, whether the action was optimal across the full state of the system, or whether a subtler failure mode was missed entirely. You see the output. The mechanism is underneath the interface.

Control is partly real, partly theatrical. The theatrical portion is not useless. An approval gate reviewed superficially still catches obvious failures: the agent proposing to delete a production index, executing a rollback without a stated reason, calling an endpoint the operator knows is off-limits. Those catches are valuable. The control surface creates an escalation path when behavior deviates visibly. The accountability structure is real even when the depth of review is shallow.

The gap between the control the cockpit projects and the control it delivers is where production incidents originate. The agent behaved within the defined parameters. The approval gate was cleared. The audit log is clean. Something still went wrong, and it went wrong in the space between what the governance framework could see and what the actual system state required. Knowing where that gap is, and being honest that it exists, is the prerequisite for managing it rather than discovering it during an incident.

Behavior Is Real; Agency Is Attributed

Here is the structural point that both costumes obscure.

Behavior is real. Agency is attributed.

The loop produces behavior. It observes, it reasons, it acts. That behavior is empirically real. You can log it, trace it, measure it, reproduce it. Nobody disputes the behavior. When Elastibot calls cluster_health, reads the output, calls nodes_stats, identifies the heap pressure, proposes remediation steps — that sequence happened. The log entry exists. The diagnosis was correct. The behavior is not in question.

Agency is a different claim. Agency implies authorship: the idea that the system made a decision in some meaningful sense, that it chose this path rather than another, that it can be held responsible for the outcome in the way a person can. When you say the agent "decided" to call the escalation path, or "chose" to skip the validation step, you are attaching those properties to a loop that does not have them. The loop ran a function. The function was defined by a developer. The tools available to the loop were chosen by someone who understood the domain. The prompt that shaped the model's reasoning was written by someone with a theory about how the model should behave. What counts as done was decided by whoever deployed the system. The exit condition is not self-determined.

The boomerang is this: if agency is attributed to the agent, it flows back to the people who built and deployed it. Not as a technicality. As a structural reality. The agent acts; the organization acts. The loop ran; someone is responsible for what the loop was allowed to do. When the autonomous system in the AGI marketing copy makes a consequential error, the attribution that was granted on the way up gets redirected on the way down. The autonomy framing that made the funding pitch compelling becomes the liability exposure in the post-incident review.

This point is not hypothetical. It is the shape of every governance conversation that follows a production failure involving an agentic system. "The AI did it" doesn't close the inquiry. It opens it. Who authorized the AI to do it? Who designed the tool that executed the action? Who set the exit condition that caused the loop to terminate there? Who reviewed the approval gate that cleared the action? The questions are not rhetorical. They have answers. The answers point to people. Attribution flows to humans not because the law demands it (though it often does) but because the humans are genuinely where the design decisions lived.

The harness weaponizes the same attribution reflex the gown uses for marketing, except now the reflex runs in reverse. You built a system that acts. You said it acts autonomously. It acted. The action had consequences. The chain of responsibility terminates at the people who designed the harness, defined the tools, and chose what the loop was authorized to do. Behavior is real. Agency is attributed. The attribution is yours.

The Loop Running: Elastibot

Let me be concrete about what this looks like in practice, because the abstraction only goes so far. The following is a representative triage session. The pattern is accurate, the root-cause chain is real, and the numbers (heap at 98%, swap on 4 nodes, ~2 hours recovered) are consistent with what actually happened on the cluster.

Production federal cluster. Thirty nodes, 30,000–40,000 events per second, 300+ engineers depending on it. The cluster moved yellow. Shard allocation was failing. I ran Elastibot.

TRIAGE LOOP TRACE // ELASTIBOT · REPRESENTATIVE ✓ RESOLVED

observe (1) cluster_health → status yellow, unassigned shards accumulating

observe (2) nodes_stats → heap at 98% across multiple data nodes

observe (3) nodes_os_stats → swap actively in use on 4 nodes

observe (4) allocation_explain → unassigned reason: disk watermark crossed due to heap swapping to disk under load

reason bootstrap.memory_lock not set; OS swap enabled; JVM heap thrashing to disk; root cause identified

proposed actions disable swap, restart affected nodes with memory_lock enabled, verify cluster health post-restart

approval gate actions held pending operator review; I reviewed and approved

outcome Cluster green in 22 minutes. ~2 hours recovered vs. prior manual triage baseline.

The loop read telemetry four times before it had enough state to reason about the root cause. That's not four API calls. That's four iterations of observe, reason, plan, with each iteration narrowing the hypothesis space. The architectural diagram doesn't show that. The loop is six lines. The operational reality is four rounds of evidence-gathering before the diagnosis became confident enough to propose action.

The approval gate made me feel, and partly be, in control. The distinction matters. "Partly" is doing real work there. The feeling of control isn't nothing. It slows the loop, forces a moment of review, creates an opportunity to catch what the model missed. The "partly" acknowledges that the depth of that control depends entirely on the reviewer's situational awareness. I had it on that cluster. Someone reviewing the same gate without six years of context on that stack would have had less. The gate is the same. The control is not.

This is also where the attribution point becomes concrete rather than abstract. Elastibot proposed the remediation. I approved it. The organization executed it. When it worked, the loop gets the credit in the post-incident write-up. If it had failed — wrong diagnosis, action with unintended side effects, approval gate cleared without sufficient review — the loop would get the description and I would get the accountability. The behavior is the loop's. The agency is mine.

Back to where we started. Every demo, every announcement, every AGI narrative is built on the loop I described at the top. The models are genuinely better than they were. The tools available to those models have gotten more capable and more numerous. The harnesses that wrap the loop — the approval gates, the audit infrastructure, the governance layers — have gotten more sophisticated. The underlying loop has not changed.

The gown is real value. The natural language interface that makes the loop accessible to people who couldn't operate a raw API is genuine. So is the enterprise governance surface that creates accountability even when the review is imperfect. The capability improvements that make the loop more useful across a wider range of tasks are real, they are ongoing, and they matter. None of that requires pretending the loop is something other than what it is.

Operators who can see the loop evaluate capability claims accurately. They know when the autonomy framing is useful shorthand and when it is a liability they are taking on without realizing it. They know when the approval gate provides real oversight and when it is a compliance checkbox that will not catch the failure mode they actually face. They know which parts of the harness were built with care and which parts were shipped to satisfy a governance checklist. They know what they are buying. They know which parts of the gown are worth paying for, and which parts are theater.

An agent is a loop in a gown. Learn to see the loop, and you'll know when the gown is worth paying for.

A Loopin a Gown