Loop Engineering in 2026, Part 1: What an Agent Loop Is and How to Build One
The practical version of loop engineering: what an agent loop actually is, the six parts every loop is assembled from, how Claude Code's /goal works, a PR-babysitter you can build today, why iterations not tokens are the real cost, and when not to loop at all.
A claim has been making the rounds in AI coding circles all year: stop prompting your coding agents, and start designing the loops that prompt them for you. Like most things that spread fast, it gets repeated constantly and explained almost never. So here is the practical version, the part you can actually build, before Part 2 gets into how a fleet of these loops starts to compound.
The clearest way to hear the shift is from the people furthest down the path. Boris Cherny, who created Claude Code, has put it about as bluntly as it can be put.
“I don’t prompt Claude anymore. I have loops that are running. They’re the ones prompting Claude and figuring out what to do. My job is to write loops.
”
Peter Steinberger made the same point from the other direction, that you should be designing loops that prompt your agents rather than prompting them yourself. The point is not that prompt engineering died. It is that the work moved up a level, from writing the code to writing the system that writes the code. Developers furthest along report stretches where they shipped hundreds of pull requests without opening an editor, every line written by an agent.
What a loop actually is
Strip away the hype and a loop is a small program you write that does four things: it prompts the coding agent for you, reads what the agent produced, decides whether the work is done, and if it is not, prompts again with the error or the next step. That is the whole idea. You stop sitting inside the loop typing prompts. You write the loop, and the model becomes a subroutine it calls.
The shape never changes: set a goal, act, check, feed the result back, and repeat until the check passes or the loop stops itself. Everything else is detail bolted onto that skeleton.
The word “loop” means at least five things
Most of the arguing online is people using one word for five different ideas. It helps to see them as a progression, oldest to newest, because each one fixed a flaw in the last.
ReAct (2022)
The original research pattern: reason, act, observe, repeat. Every loop since is a descendant of this.
AutoGPT (2023)
A self-prompting goal loop that captured imaginations and then became notorious for one flaw: it never reliably knew when to stop.
The ralph loop
A deliberate context reset between iterations, so the agent does not slowly drown in its own accumulated history. Less is more.
/loop and /goal
Cadence and completion conditions built into the agent itself, carrying state across turns instead of asking you to babysit it.
Orchestration
One author fans out many agents that read your GitHub, Slack, and chat and decide what to build next. This is where Part 2 picks up.
When two people disagree about loops, they are usually standing on different rungs of this ladder.
The six parts you assemble
The progression tells you what people mean. This is what a loop is actually built from. The same six parts show up every time, and most of them now ship inside the coding tools instead of being custom scripting you maintain by hand.
1 · Trigger
A schedule, webhook, file change, or a label landing on a PR. This is what separates a real loop from a run you repeat by hand.
2 · Isolation
A private checkout per agent, usually a git worktree, so two agents running at once cannot clobber each other’s files.
3 · Written-down context
Conventions, build steps, and project rules kept where the agent reads them every run, so it does not re-derive your project each pass.
4 · Reach into your tools
Connectors to the issue tracker, CI, database, and chat, so the loop opens the PR and posts the result instead of waiting on you.
5 · An independent check
A second agent grades the output, kept apart from the one that produced it, because a model reviewing its own work passes almost everything.
6 · State on disk
A markdown file, a board, or a queue. The model forgets between runs; the file does not.
Assemble those six and you have a working loop. The reason the pattern jumped from fringe trick to common practice this year is that you used to hand-build every one of them, and now most ship as built-in features.
A loop that ships inside the agent: Claude Code’s /goal
The cleanest way to feel this is a loop that comes built in. In Claude Code, the smallest complete loop is /goal: you hand it a verifiable end state, and it keeps taking turns until that state is true. You launch the session and set the goal inside it, for example /goal tests in test/auth pass. It is the same act, check, repeat shape, with the verifier built in.
The thing worth internalizing is that a strong goal reads less like a prompt and more like a contract. The good ones specify four things, and leaving any of them vague invites the model to take the easiest reading and declare victory while the real system is broken.
The four things a good goal specifies
End state: the checkable condition you want true. e.g. “every call site migrated; npm test exits 0”
Evidence: what proves you reached it. e.g. “test output shows 0 failures; build succeeds”
Constraints: what the agent must not break getting there. e.g. “only touch lib/billing; no changes to the public API”
Budget: how much work it may spend. e.g. “stop after 20 turns or 10 changed files, then ping a human”
Two controls keep it reliable. Make the check measurable: a test result, an exit code, a file count, an empty queue. “npm test exits 0” is a goal; “make it better” is not. And bound the run with something like “or stop after 20 turns,” so a stuck loop halts instead of burning your budget.
There is a subtlety hiding in the verifier step that becomes important later: the checker does not have to be the same model as the coder. Once a loop has distinct roles, planner, executor, evaluator, screenshot reviewer, each can run on a different model. Some plan better, some execute more cheaply, some judge an image more accurately, and choosing which model fills which role becomes an architecture decision rather than a single bet on one “best” agent.
A concrete loop you can build today: the PR babysitter
The most useful first loop is unglamorous on purpose. It watches your open pull requests and keeps the build green.
Trigger
Every 15 minutes.
Scope
Open PRs carrying the label agent-watch, and nothing else.
Action
If CI is red for a deterministic reason, attempt exactly one fix. If main has moved underneath it, rebase once.
Budget
One fix attempt per PR, five minutes, ten files changed. Hard stops.
Stop condition
CI green, or budget exhausted. Then stop and ping a human.
You come back to merged PRs instead of a backlog of broken builds. The same shape covers most operations work: a CI-health loop every 30 minutes that clusters failing runs by signature, so ten red PRs with one root cause become one thing to look at; a deploy-verification loop that hits your endpoints after a push and flags regressions before users do; a feedback-clustering loop that pulls comments from your channels and maps each theme to the file that owns it.
Where the cost goes now
For two years the cost question in AI coding was simple: which model, and how many tokens. Inside a loop, that instinct points at the wrong layer. The spend is no longer a single call, it is how many times the loop goes around. A loop that retries six times before it converges costs six times as much as one that lands on the first pass, on the very same model.
“Iterations are the budget line, not tokens. A cheaper model that loops twice as often is not cheaper.
”
That reframes what is worth tuning. Track cost per finished task, not cost per call. Treat a weak verifier as the most expensive bug you can ship, because a loose check either stops early on broken work or grinds away on work that was already fine, and both waste whole iterations. And cap consecutive failures, because a loop with no failure limit does not eventually succeed, it eventually drains the account. You used to tune the prompt. Now you tune the loop, because that is where the cost accumulates.
When not to loop, and what goes wrong
Loops pay off when a task repeats and a machine can tell when it is done. Outside that, a loop just automates churn, so skip it for one-shot edits you could finish in a single pass, for unscoped exploratory work like “figure out why users are churning” that has no pass condition, and for anything whose only verifier is your own eyes. If you cannot write the check, you are still inside the loop, and you should build the check first or do the task by hand.
How to build your own
Putting it together, the recipe is short. Pick one repeatable task, the PR babysitter or CI triage are good first picks. Scope it tight, “fix the billing webhook validation, only touch the billing module” beats “fix the bug.” Give it a budget and a stop condition across attempts, runtime, files, spend, and consecutive failures. Add an independent verifier as a separate sub-agent, because the agent that wrote the code is the worst judge of whether it is done. Run it on a cadence with /loop, cron, lifecycle hooks, or GitHub Actions so it survives a closed laptop. And keep memory on disk, in markdown or a board, because the model forgets between runs and the file does not.
Build it like someone who intends to stay the engineer responsible for the output, not just the person who starts the run. Get one loop trustworthy, and you are ready for the interesting part, which is what happens when many of them start sharing what they learn. That is Part 2.