Because AI models learn from existing, community-wide code — public repositories, Q&A sites, countless snippets from developers of every skill level — they inevitably absorb whatever flaws, antipatterns, and outdated habits exist in that collective mass. And let’s be honest: most of the world’s code isn’t written by senior engineers at well-run companies. It’s written by developers of all levels, on side projects, experiments, school assignments, weekend ideas, and half-maintained libraries. Naturally, the distribution leans toward “good enough,” not exceptional.
If you imagine a graph with code quality on the x-axis and volume of training data on the y-axis, you end up with a bell curve skewed slightly left: a big mass of mediocre code, a smaller amount of genuinely bad code on the left tail, and a much thinner right tail containing the truly great, high-quality code. A statistical model trained on that distribution gravitates toward the center — the common patterns — not the best ones. And this is crucial: LLMs don’t know why a pattern exists; they only know it’s frequent. Frequent is not the same as correct, elegant, or secure.
Ask an LLM to write a simple REST API handler and you’ll often get the same familiar patterns: outdated error handling, missing input validation, or overly permissive defaults. Not because the model is “wrong”, but because that’s the statistical average of what it has seen. And here’s the part people keep missing: if most code in the wild cuts corners, the model will too — confidently.
On top of that, the situation risks getting worse over time. As more AI-generated code finds its way into public repositories, the next generation of models starts training on the outputs of the previous one. This feedback loop — often called “model collapse” — slowly erodes quality by reinforcing the very patterns we want to move past. Innovation usually comes from deliberately breaking patterns, but a statistical model has no incentive to deviate. Its entire objective is to conform.
In practice, this means AI-generated code often feels predictable, familiar, and safe — not because it’s optimal, but because it’s statistically average. Instead of pushing forward new ideas, new architectures, or new abstractions, LLMs risk amplifying the ecosystem’s existing weaknesses and cementing yesterday’s solutions as today’s defaults.