The Bill Comes Later: What AI Is Really Doing to Your Cod...

The Demo Always Looks Good

I’ve been in enough board rooms to know what happens when someone demos AI-assisted coding to executives. The cursor moves, code appears, tests pass, everyone claps. It’s genuinely impressive. And then, about six weeks later, someone calls me because three services are tangled together in ways nobody can explain and the security team found an injection vulnerability that looks like it was copied verbatim from a Stack Overflow answer from 2019.

That’s not a coincidence. That’s a pattern. And now there’s data to back it up.

What GitClear Actually Found

Bill Harding and the team at GitClear have been tracking code quality metrics across anonymized repositories for years. Their 2025 report — the second annual — analyzed 211 million changed lines of code spanning 2020 to 2024, pulling from both private repositories and 25 of the largest open-source projects. The scale matters. This isn’t a survey or a controlled lab study. It’s observational data from real production codebases.

The headline number: in 2024, the frequency of duplicated code blocks increased eightfold compared to earlier baselines. Eight times. That’s not noise.

But the duplication finding is just the most dramatic. There are two other metrics that I think are actually more important for understanding the compounding effect.

First, code churn — defined as the percentage of newly written code that gets revised or deleted within two weeks of being committed — rose from 3.1% in 2020 to 5.7% in 2024. That might sound modest. What it means in practice is that a meaningful slice of AI-generated code doesn’t survive to the third sprint. Someone writes it, the AI confidently scaffolds around it, and then a reviewer or a failing test forces a rewrite. The churn metric is a direct proxy for how much of what gets generated is actually understood at commit time.

Second, the percentage of code classified as “moved” or “refactored” — the signal that developers are actively improving and consolidating existing code — dropped from around 25% of changed lines in 2021 to under 10% in 2024. That is a collapse in refactoring activity. When you’re copy-pasting AI output instead of building on what’s already there, you stop cleaning up the old stuff. Entropy wins.

Harding’s framing is precise: AI tools suggest valid code, but they can’t reuse and modify existing code the way a developer who actually knows the codebase can. The result is sprawl. Every sprint adds more surface area.

The Stanford Security Angle

The other anchor I keep coming back to is a study out of Stanford — Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh — published at ACM CCS 2023, titled “Do Users Write More Insecure Code with AI Assistants?” Their experiment was straightforward: give developers security-related tasks, give some of them access to an AI coding assistant, compare the results.

The outcome was unambiguous. Participants with AI access produced significantly more security vulnerabilities than those without. Particularly bad results showed up in string encryption and SQL injection scenarios. And here’s what should concern anyone running an engineering team: participants with AI access were also more likely to believe their code was secure. The assistant didn’t just introduce bugs. It increased overconfidence in buggy code.

That combination — more vulnerabilities, higher confidence — is what makes this dangerous at scale. Developers aren’t flagging the AI-generated code for extra review. They’re shipping it faster.

The study used Codex-based assistants (OpenAI’s codex-davinci-002, which was the frontier at the time of the experiment). Current models are better, but the study’s insight isn’t model-specific. The mechanism is behavioral: when something generates the code for you and it looks syntactically correct and the tests pass, you do less critical reading. That’s human nature, not a model version problem.

Where I’ve Seen This Personally

I’ve scaled engineering teams across pretty different contexts — enterprise IT at HP and Nokia, product engineering at Intuit, consulting engagements at EY, and now building Zero from scratch. The pattern I see varies a lot by seniority and codebase maturity.

Junior developers using AI assistants are getting code out the door faster, which is real. What I’ve noticed though is that the questions they don’t ask me anymore are often the important ones. Before, a junior engineer would ask why we use a particular pattern or why a function is structured a specific way. The AI gives them an answer, and they move on. The missing conversation is where the institutional knowledge transfer happened.

Senior engineers have a different failure mode. They use AI to go faster on parts they already understand, which is mostly fine. But I’ve watched experienced engineers accept AI suggestions in unfamiliar domains — auth flows, cryptographic handling, IPC patterns — and not slow down because the code reads well. Readable ≠ correct. The Stanford study is essentially measuring this gap.

The real liability though is the 30-to-90-day tail. Code churn shows up in two weeks (that’s what GitClear measures). Security vulnerabilities often don’t surface until you hit real traffic patterns, edge cases, or a pentest. The technical debt from duplicated code takes months to become the maintenance nightmare where the team can’t confidently change anything because the logic is scattered across four nearly-identical functions.

By the time that bill arrives, the sprint that generated the code is ancient history.

This Isn’t Anti-AI

I want to be clear about this: I use AI coding tools. My team uses them. The productivity gains on well-understood, clearly-scoped tasks are real and I’m not pretending otherwise. Writing boilerplate, generating test stubs for known patterns, exploring unfamiliar APIs — AI is genuinely useful here.

The problem is the defaults. The default is to ship what gets generated. The default is not to run a separate security review on AI-assisted code. The default is not to search the existing codebase for duplication before accepting a suggestion. Nobody sets these defaults maliciously. They just happen because the tools are optimized for speed of generation, not quality of integration.

What I’ve started requiring in teams I advise: treat AI-generated code exactly like third-party library code. You review it. You don’t assume it’s correct just because it came from a capable source. You look for integration points that break your existing abstractions. You pay special attention to anything touching authentication, input handling, or external data.

That friction is annoying. It partially offsets the speed gain. I think it’s worth it, because the alternative — faster velocity into a codebase that’s increasingly duplicated, increasingly untested in aggregate, increasingly vulnerable — isn’t actually faster over a 12-month horizon.

The Structural Question

Here’s what the GitClear data is pointing at that nobody wants to say directly: we’re in a phase where AI is making individual developer productivity numbers go up and codebase health metrics go down, simultaneously. Both things are true.

The industry is measuring the former. Quarterly OKRs count features shipped and PRs merged. Nobody is tracking the ratio of net-new logic to duplicated blocks, or trending their code churn rate, or running entropy analyses on their test-to-code ratios. GitClear tracks this because it’s a niche product designed to surface exactly this. Most engineering orgs don’t have equivalent visibility.

I think the companies that figure out how to instrument code quality in the AI era — not just velocity — are going to have a durable advantage in 18 to 24 months. The ones that don’t are going to discover around month 18 that they need a rewrite, and the rewrite will take six months, and they’ll do it again with AI and recreate the same problems.

The bill always comes. The question is whether you’re surprised when it does.

The Bill Comes Later: What AI Is Really Doing to Your Codebase

The Demo Always Looks Good

What GitClear Actually Found

The Stanford Security Angle

Where I’ve Seen This Personally

This Isn’t Anti-AI

The Structural Question

Anshad Ameenza

Get new ideas in your inbox

Related Articles

System Design Principles: Building Scalable and Resilient Architectures

Build a Claude Code Agent Team That Loops Until the Work Is Actually Done

From Coding to Conducting: What Karpathy's Software 3.0 Actually Means for Engineers

The Bill Comes Later: What AI Is Really Doing to Your Codebase

The Demo Always Looks Good

What GitClear Actually Found

The Stanford Security Angle

Where I’ve Seen This Personally

This Isn’t Anti-AI

The Structural Question

Anshad Ameenza

Get new ideas in your inbox

Related Articles

System Design Principles: Building Scalable and Resilient Architectures

Build a Claude Code Agent Team That Loops Until the Work Is Actually Done

From Coding to Conducting: What Karpathy's Software 3.0 Actually Means for Engineers

Cookie & Reality Check