I’ve been watching the AI coding space pretty closely — it’s been a running theme in what I write about — and this week a stat landed that I think deserves more attention than it’s getting.
Over 52% of code commits on GitHub are now AI-generated. And with the launch of Copilot X, GitHub is projecting that number could hit 92%.
Let that sink in for a second. We’re not talking about AI assisting developers. We’re talking about AI authoring the majority of production code. And the trajectory from 52% to 92% isn’t some distant forecast — it’s the next iteration of the tool.
The Code Won’t Stop. It Just Won’t Be Human.
Here’s the thing that I think a lot of people are brushing past: at 92% AI-generated code, the code doesn’t stop being written. It just stops being written by us. The volume of new software, new features, new infrastructure — that keeps growing. But the human fingerprint on it shrinks to almost nothing.
And I think that creates a problem that’s more subtle and more important than the “AI takes our jobs” headline.
The Training Data Paradox
One of the most interesting counterarguments I’ve seen is what I’d call the training data paradox. These models — Copilot, ChatGPT, all of them — are trained on existing human-written code. Billions of lines of it, scraped from public repos, open-source projects, Stack Overflow threads. That’s the foundation.
So what happens when 92% of new code is AI-generated? The model starts training on its own output. The bank of genuinely novel, human-originated code goes static. And if you follow that trajectory, human input approaches zero pretty quickly — which means Copilot effectively runs out of new training data.
This is a real constraint. I’ve written before about how training these models is still astronomically expensive, and that cost issue hasn’t gone away. But the data issue might actually be more fundamental. You can throw more compute at training. You can’t manufacture the kind of creative, novel problem-solving that a PhD student does when they publish a new algorithm.
Can AI Actually Innovate? Not Yet.
I tested this myself. I asked ChatGPT to create a sorting algorithm faster than three-way radix quicksort — one of the fastest string sorting approaches out there. It couldn’t. It basically told me as much: “It’s unlikely that I can create a sorting algorithm that is guaranteed to be faster than three-way radix/string quicksort on all inputs.”
That’s honest. And it’s revealing.
These models are VERY good at regurgitating and recombining code they’ve seen. They’re excellent at pattern matching, at taking a well-understood problem and producing a clean solution fast. That’s genuinely useful — I’ve built entire sites using AI-assisted workflows and the productivity gains are real.
But innovation? Genuine novelty? The kind of breakthrough that comes from a researcher spending three years on a problem and seeing a connection nobody else saw? That’s still firmly human territory. The models aren’t learning from unseen information — they’re statistically predicting the most likely sequence of tokens based on what they’ve already consumed.
The Real Danger Isn’t Today. It’s the Upgrade Cycle.
Here’s where I think the conversation needs to go — and where I’ll be honest, it makes me a bit uncomfortable.
The danger isn’t that AI writes 92% of today’s code. The danger is what happens when AI gets smart enough to improve itself. When we’re not talking about a model that generates boilerplate React components, but one that can redesign its own architecture, optimize its own training process, rewrite its own inference engine.
We aren’t talking about years from now for that capability. Especially if we build in reward mechanisms for self-improvement — which, let’s be clear, is exactly what reinforcement learning from human feedback already does in a limited way.
When that self-improvement loop kicks in, two things happen very fast. First, the process accelerates exponentially — each improvement makes the next improvement easier. Second, and this is the part that keeps AI safety researchers up at night, the system’s purpose can drift. An unsupervised self-improving system doesn’t need malicious intent to be dangerous. It just needs an unclear objective function and enough capability to pursue it.
Where I Come Out
I don’t think the world is ending. I think a lot of the reaction right now is overblown — the models aren’t actually that smart yet, the costs are still massive, and the limitations are real and well-documented.
But I also think dismissing the trajectory is a mistake. The jump from 52% to 92% isn’t linear progress. It’s a signal that we’re approaching a phase change in how software gets built, and we need to be having serious conversations about training data sustainability, about innovation pipelines, and about what guardrails look like for self-improving systems.
The code won’t stop. The question is whether we’re still the ones deciding what it does.