Anshad Ameenza.
Engineering ·

Vibe Coding Is Not Real Engineering — Until It Is

Karpathy coined the term. Willison drew the line. Here's how a toy concept became a genuine workflow split that every engineering team now has to navigate.


How a Tweet Became a Discipline

On February 2, 2025, Andrej Karpathy posted something that took maybe 20 seconds to read and has been argued about ever since. The idea was “vibe coding” — a mode of programming where you “fully give in to the vibes, embrace exponentials, and forget that the code even exists.” You describe what you want, the AI generates it, you run it, if something’s broken you describe the error back to the AI, you iterate. You don’t read the code. You don’t care what it actually says.

Karpathy framed it playfully. He described it as something he did for weekend projects, for quickly prototyping ideas he’d otherwise never build. The phrase was deliberately irreverent.

By March 2025, Merriam-Webster had listed it as a slang and trending expression. Collins English Dictionary named it Word of the Year for 2025. The term had escaped.

And that’s where Simon Willison started to get concerned.

The Distinction Willison Drew

Willison published “Not all AI-assisted programming is vibe coding (but vibe coding rocks)” in March 2025 — the timing was deliberate, responding to what he saw as a conflation that was starting to distort how people talked about the entire category of AI-assisted development.

His concern was specific: people were using “vibe coding” to describe all forms of programming done with AI assistance. That’s wrong, he argued, and it matters because the two things are fundamentally different in terms of accountability, correctness, and whether they produce code you can maintain.

His definition was clean: vibe coding means building software with an LLM without reviewing the code it writes. That’s the specific thing. Accepting suggestions without understanding them. Moving fast by skipping the reading.

The contrast is what he called professional AI-assisted coding — where the developer uses LLMs to accelerate work but remains fully accountable for what gets committed. His stated golden rule: he won’t commit any code to a repository if he couldn’t explain exactly what it does to somebody else. If an LLM wrote every line, but you’ve reviewed it, tested it, and can explain it, that’s not vibe coding in his view. It’s just using a typing assistant.

That’s the line. And it’s a meaningful one.

Willison later proposed calling the professional end “vibe engineering” — a term he floated in October 2025 — though by early 2026 he was noting that “agentic engineering” was winning the naming war. The semantic battle is ongoing, but the underlying distinction has held up.

Why This Matters Beyond Semantics

I’ve been building software in various capacities for a long time — running engineering teams at HP, Nokia, Intuit, doing architecture work at EY, now building Zero. And I want to say something that might be unpopular: Karpathy’s original framing of vibe coding is genuinely useful and I don’t think it should be dismissed.

For a certain class of problem, vibe coding is the right tool. I’ve built small data processing scripts, quick API explorers, internal dashboards I needed in two hours, and one-off data migration tools this way. I don’t own that code the way I own production code. I run it, verify the output, use it once or twice, and move on. The code quality doesn’t matter because the code has no future. The only output that matters is whether the resulting artifact does what I wanted.

This is not how most software engineering organizations work. Most engineering orgs are building things that will be in production for years, that other developers will read and modify, that need to be secure and performant and auditable. For that context, vibe coding isn’t a style choice — it’s a liability.

What I find interesting is that the split Willison identified has become a genuine workflow split inside engineering teams, not just a philosophical distinction. The same developer might vibe-code a throwaway exploration tool in the morning and then carefully review every AI suggestion for a production auth change in the afternoon. The question is whether they’re explicitly choosing the mode or just defaulting.

What Defaulting Gets You

The failure mode I’ve seen most often is not developers who consciously chose to vibe code production systems. It’s developers who started treating AI suggestions like trusted documentation without deciding to do so.

Here’s how it happens: you’re writing a feature, you ask Copilot or Claude for a suggestion, the suggestion looks correct, you’re in flow, you accept it and move on. You did that twenty times in the sprint. You technically reviewed each piece — a quick scan — but you didn’t run the mental model that asks “would I have written this, and if not, why is the AI writing it this way?”

That gap — between scanning and understanding — is where the Willison distinction really lives. Scanning is what you do at vibe-coding speed. Understanding is what professional AI-assisted coding requires.

I’ve started thinking of it as the “could I defend this in a postmortem?” test. If something goes wrong with this code at 2am, can I explain what it’s doing and why the failure happened? If the answer is no — if I’d need to re-read the AI-generated code to understand it — then I didn’t review it properly and I’m carrying a debt I don’t have a full account of.

The Agentic Complication

Both Willison and Karpathy have written about how agentic systems complicate this further. When you’re not just accepting suggestions but running multi-step agents that write code, run tests, modify files, and loop back — the surface area of “what did the AI actually do” expands dramatically.

Willison has written about what he calls “lethal trifecta” risks in LLM agents: the combination of tools with access to private data, exposure to untrusted content, and the ability to externally communicate. In a vibe-coding agent scenario, you can end up with a system that’s executing code you never read, calling APIs you didn’t explicitly authorize, and modifying state you didn’t ask it to modify — all in the service of “build me a feature.”

The agentic case is where the accountability question gets genuinely hard. You can establish a golden rule about reviewing code before committing it when the AI is giving you suggestions. What’s the equivalent rule when an agent is running a 47-step loop that modifies 23 files?

I don’t think anyone has fully answered this yet, including Willison. The best current practice is checkpointing — requiring human review at meaningful decision points in an agent run, not just at the final output. But the tooling for this is still immature.

The Useful Conclusion

The Karpathy framing and the Willison framing are not actually in conflict. Karpathy described something real and useful. Willison said “yes, but that thing is not the same as what we should be doing with production code, and conflating them creates problems.” Both points are correct.

What I’ve landed on in practice: the mode of coding should be a deliberate choice made based on the lifetime and risk profile of what you’re building. Not a default you fall into because the tool makes generation easy.

Vibe coding for a weekend project that will run once and get deleted: go for it. Vibe coding for the function that handles password resets: absolutely not.

The word for the gap between those two things is judgment. AI doesn’t change the fact that engineering requires it. If anything, the acceleration that AI provides makes the judgment more consequential, not less. You can make bad decisions much faster now, and bad decisions at speed compound faster than bad decisions at human pace.

That’s what Willison was getting at. That’s the distinction worth holding onto.


AI Software Development Developer Tools
Share: