From Coding to Conducting: What Karpathy's Software 3.0 Actually Means for Engineers
Andrej Karpathy's Software 1.0 → 2.0 → 3.0 arc reframes what engineering actually is. Here's how the job changes when your team includes agents.
I want to start with a quick taxonomy, because Andrej Karpathy has given us one of the most useful mental models for understanding what’s actually happening to software — and I think it’s underused.
Karpathy introduced “Software 2.0” in a 2017 Medium essay. The idea: classical software (Software 1.0) is explicit instructions written by humans in languages like Python or C++. Every behavior is something a programmer had to think up and encode. Software 2.0 moves that to neural network weights — instead of writing rules, you specify a goal, define a loss function, and let optimization find the program. You don’t write spell-checker logic; you train a model on examples and the learned weights are the spell-checker. By the time he left Tesla in 2022, the autopilot perception stack had largely been rebuilt this way, replacing hand-coded computer vision rules with large neural networks trained on video.
Then, in a June 2025 keynote at Y Combinator’s AI Startup School, he extended the framework: Software 3.0. The programming layer is now natural language. Prompts are the programs. LLMs are the runtime. The context window is the new source file.
This isn’t just a clever framing. It’s a genuinely useful way to think about the shift in what engineering actually requires.
What “Vibe Coding” Really Signalled
On February 2, 2025, Karpathy posted a tweet that got viewed over 4.5 million times: “There’s a new kind of coding I call ‘vibe coding’, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.”
He was describing his own experiments using Cursor Composer with Anthropic’s Sonnet model, often with voice input via SuperWhisper, to build software almost entirely through conversation. No line-by-line syntax wrangling. Just describing what you want, watching it appear, clicking “accept.”
The term went viral, got mainstream press, and predictably got misread in two opposite directions. Half the commentary was “the end of programmers!” and half was “this is just autocomplete, calm down.” Both missed the point.
What Karpathy was actually signalling wasn’t that coding is dead. It was that there’s a new mode — one that trades precision for speed on certain kinds of problems, and that the accessibility of that mode is genuinely new. You can now build working software by narrating what you want. That changes who can build things (much larger pool of people) and changes what experienced engineers should be spending their time on.
I tried this myself in a project in Bangalore last year — building an internal tool for Zero that I’d been putting off because it felt like a three-day job. I described it in natural language to an agent, did a lot of prompting and correcting, and had something functional in a day. Was it production-grade? No. Did it need heavy review and hardening? Yes. But it got done, and I was acting more like an editor than an author for most of it.
The Shift in the Engineer’s Job
Here’s how I’d characterize what’s actually changing, based on watching teams work over the last 18 months:
Before: The bottleneck in software development was typing — translating a well-understood solution into working code. Skilled engineers were fast at this. Junior engineers were slow.
Now: Code generation is cheap. The bottleneck has shifted to specification — being clear enough about what you want that an agent can do something useful — and judgment — knowing whether what was generated is correct, safe, maintainable, and worth keeping.
This is not a small shift. Specification and judgment are significantly harder skills to develop than typing speed. They require deep understanding of the domain, the codebase, the failure modes. They require knowing what “correct” means in context, not just what “runs without error” means.
Karpathy’s frame for this is that prompts are now programs — and writing good prompts for agents is a form of programming. You’re specifying behavior at a higher level of abstraction. When I describe a feature to an agent, I’m writing a spec. When I review what it produces, I’m reading code. When I decide what to keep and what to throw away, I’m doing architecture.
The engineers who are thriving in my teams right now are the ones who are good at this higher-level work. They can articulate what they want precisely enough to get useful output, they can read generated code quickly to evaluate it, and they don’t emotionally attach to the output — they’re willing to throw it away and try again. These skills are not new (they’re basically senior engineering skills) but they’ve moved from “nice to have” to “the main thing.”
The Orchestration Problem
Here’s where I’ll push back on some of the enthusiasm around Software 3.0.
Karpathy describes LLMs as operating systems — platforms with APIs and abstractions that you build on top of. That’s an elegant analogy. But if LLMs are the OS, someone has to be the kernel programmer. Someone has to understand when the abstraction leaks, what the failure modes are, why the “plausible-sounding” output is wrong in a particular context.
In the vibe coding mode, you’re essentially trusting the model’s judgment about implementation. For a throwaway internal tool, that might be fine. For anything with security implications, performance requirements, or correctness guarantees that matter to a business — the model’s judgment is not enough. And the model doesn’t know what it doesn’t know.
I’ve seen this go badly in a couple of interesting ways. One team I advised had built a data pipeline almost entirely through agent-assisted coding. It worked. It passed tests. It went to production. Three months later, they discovered a subtle bug in how they were handling timezone conversion — the kind of bug that only shows up in certain edge cases involving daylight saving time transitions. The agent had generated code that was idiomatic and looked correct, and the developers had trusted it without deeply understanding the logic. That bug cost them a weekend of incident response.
The orchestration problem is real: as we use agents to generate more of our systems, we need more engineering discipline around specification, review, and testing — not less. The irony is that Software 3.0 requires a more rigorous engineering practice in some dimensions, even as it makes the act of coding easier.
The Role That’s Emerging
There’s a new kind of work pattern I see taking shape in the best engineering teams I know. I’d describe it as conducting rather than playing.
A conductor doesn’t play every instrument. They understand what each instrument should sound like, they have a detailed model of what the piece is supposed to sound like overall, and they spend their energy on coordination, timing, and quality — not on physical execution of the notes. When something sounds wrong, they can identify it, diagnose it, and direct a correction.
The best AI-era engineers are working this way. They’re doing less typing and more listening — listening to what the agent produces, comparing it against their mental model of what it should produce, directing corrections. They’re thinking about the architecture of a system while the agent fills in the implementation details. They’re specifying tests before the code is written, so they have a way to evaluate what comes back.
Technically, this requires knowing more, not less. You need to understand what correct output looks like in order to evaluate what the agent generates. You need a model of the system in your head — data flows, failure modes, security properties — that’s detailed enough to catch what the agent misses. The engineers who think AI makes deep technical knowledge less important have it exactly backwards. Deep knowledge is now what you use to supervise the agent, rather than to write the code yourself.
I’d add one more thing: the communication gap between “what I want” and “what I say” matters more than ever. If you can’t precisely articulate what a system should do — edge cases, error handling, performance constraints — you’ll get output that satisfies your literal words and misses your actual intent. Specification is a writing skill as much as a technical one. The best engineers I know have always been good writers. That turns out to matter a lot now.
What This Means for Hiring and Development
A few things I’ve adjusted:
In hiring, I care less about whether someone can write code from scratch quickly and more about whether they can read and evaluate code written by someone (or something) else. Code reading and critical evaluation are different skills from code writing, and they don’t always travel together. I’ve started using review exercises in interviews rather than only coding exercises.
For junior engineers, I’m more deliberate about protecting time for deep code understanding that isn’t mediated by AI. If the only code you ever read is AI output, and the only code you ever write is prompts to generate more AI output, you don’t build the mental model of how systems actually work that lets you evaluate the AI’s work competently. There’s a scaffolding problem here that I think the industry hasn’t solved.
For senior engineers, the growth area is increasingly in what I’d call “agentic system design” — knowing how to decompose a problem so that agents can work on parts of it effectively, how to structure context windows and tool access, how to design the human review points. These are architecture skills applied to a new kind of system.
The Software 3.0 era is real. The engineers who navigate it well are the ones who treat the abstraction level change as an invitation to go deeper into judgment and specification — not as a permission slip to stop understanding what their systems actually do.