Clean Code, Compounding Debt: The Architectural Cost of AI-Generated Pull Requests

For most of the last two decades, technical debt described a known and largely tractable set of problems. Dependencies drift out of support. Libraries reach end of life and stop receiving patches. Known vulnerabilities collect in the software bill of materials as transitive dependencies pull in packages with open CVEs. Test coverage erodes. Dead code and duplicated logic spread.

These forms of debt share a useful property. They are local and machine-detectable. Software composition analysis flags an outdated package and the CVE attached to it. Static analysis finds the untested branch and the unreachable function. A version bump is mostly mechanical. This is the work AI coding assistants handle well, and the category where the productivity gains are real.

The debt that does not appear in a scanner is the one that is growing. Gartner projects that 80 percent of all technical debt will be architectural by 2027. Architectural debt does not sit in a file where a linter can reach it. It lives in the structure: cyclic dependencies between modules, excessive coupling across service boundaries, layering violations, and components whose responsibilities no longer match the design they were built to implement.

Code That Compiles and Does Not Fit

An AI agent works inside a narrow context window. It sees the files it is handed and little of the system around them. It does not hold the dependency graph of the wider codebase, the established service boundaries, or the conventions a team converged on over years. So it produces code that compiles, passes the existing tests, and reads cleanly in a diff, while cutting across the architecture in ways the diff does not show.

A familiar pattern illustrates the effect. Five teams each prompt an agent to add authentication to a service. Each agent, with no visibility into the shared identity library, generates its own wrapper around the same OAuth flow. Every implementation passes review on its own merits. The result is five token-validation paths where there should be one: five places to patch when the flow changes and five subtly different failure modes. No single pull request was defective. The coupling and duplication landed at the system level, where no file-scoped review was looking.

The Rate of Compounding, Not the Volume

Volume is the obvious worry with generated code. The rate of compounding is the more serious one. Sonar reports that architectural debt accumulates roughly 2.8 times faster than code-level debt, and that AI adoption raises the overall rate of technical debt accumulation by 30 to 41 percent.

Architectural debt compounds because it is structural. A cyclic dependency introduced this week becomes a constraint on every change that touches those modules next week. A leaked abstraction gets imported by the next service, and the one after that. An outdated dependency can be upgraded in place by a tool. A coupling problem can only be resolved by changing the structure, which means coordinated change across every component that now depends on it.

Gartner attributes much of this to automation bias. Under delivery pressure, developers accept AI output on surface signals: it compiles, the tests are green, the diff looks reasonable. A Sonar survey found that 96 percent of developers do not fully trust AI-generated code, while only 48 percent consistently verify it. Green tests confirm that the code does what it does. They say nothing about whether it belongs in the architecture.

The distinction governs where an engineering organization should spend effort. AI is effective at retiring the debt the industry has tracked for twenty years: unsupported libraries, open CVEs, thin coverage. It is equally effective at generating the debt that dependency scanners and static analysis do not catch. Optimizing for generation speed without matching investment in architectural control trades a category of debt that tooling already manages for one that it largely does not.

Tracking Decisions, Not Just Code

Tracking architectural decisions as they are made is a recognized countermeasure to this drift. Architecture Decision Records, or ADRs, capture each significant structural choice as a short, version-controlled markdown file committed alongside the code: the decision, the context that forced it, the options considered, and the consequences accepted. The practice, popularized by Michael Nygard, is deliberately lightweight, often a single page per decision.

The records give both human reviewers and AI agents a reference for intent. When the rationale for a service boundary, a shared identity library, or a data-ownership rule lives in the repository, a review checklist or an agent prompt can point to it, and a change that contradicts a recorded decision can be flagged rather than merged as plausible new code.

Recorded decisions also become enforceable. Architecture fitness functions, automated checks that fail a build when a disallowed dependency, a cycle, or a layering violation appears, turn an ADR from documentation into a gate. Tools such as ArchUnit on the JVM and dependency-cruiser in JavaScript codebases run these checks in continuous integration. Records alone do not stop drift. They make it detectable, which is the prerequisite for enforcing it.

What Governance Does Not Solve

Architectural governance is not a procurement decision. Three limits are worth stating plainly.

  • Tooling needs a target. Dependency analysis and architecture observability surface drift, but they cannot infer an intended architecture for an organization that has never defined one. The fitness function enforces a rule that a person still has to write.
  • The bottleneck moves to reviewers. Catching architectural mismatch requires engineers who hold the system context an agent lacks. That capacity is scarce, and it does not scale by adding more agents.
  • The cost is deferred. Architectural debt is cheap to introduce and expensive to remove. Remediation of deep, structural defects runs far higher than fixing a localized coding error, and the bill tends to arrive when the organization most needs to move quickly.

Governing Architecture Before Code

AI changes the composition of technical debt without reducing the obligation to manage it. The portion that scanners catch is shrinking as a share of the problem. The portion that depends on structure and intent is growing. Several steps follow from that shift.

  • Define and document the intended architecture, including allowed dependencies and service boundaries, so drift can be measured against a stated baseline.
  • Record significant architectural decisions as ADRs committed with the code, so intent travels with the repository rather than residing in the memory of whoever built it.
  • Enforce structure with architecture fitness functions in continuous integration, failing builds on disallowed dependencies, cycles, and layering violations.
  • Treat architectural review as a gate distinct from code review, staffed by engineers with system-level context, and grant AI-generated pull requests no exemption from it.
  • Track architectural metrics over time, including coupling, dependency cycles, and capability duplication, rather than waiting for an incident to expose them.

An engineering organization that evaluates AI-generated code only on whether it passes will accumulate the most expensive debt it can carry, one merge at a time.

About the author

Lucas Hendrich
CTO at Forte Group

You may also like

Thinking about your own AI, data, or software strategy?

Let's talk about where you are today and where you want to go - our experts are ready to help you move forward.