Two Context Windows: Managing AI and Human Attention in Agentic Delivery

Most engineering conversations about context focus on the model. How large is the context window? How many tokens a task consumes. How to keep the agent from drifting halfway through a long session. These are legitimate concerns, and teams have developed real discipline around them.

There is a second context window that receives far less attention, and it is the more expensive one. It belongs to the human who has to read what the agent produced.

When an agent opens a two thousand line pull request that touches a dozen files, the reviewer has nowhere to begin. They skim, they trust the tests, they approve. The remedy is to cap the size before the agent writes a single line and to demand a split plan first. The advice is sound. It is also one instance of a larger principle that engineering leaders should internalize now.

Both context windows are finite budgets. The model has a token limit. The reviewer has an attention limit. Agentic delivery succeeds when teams manage both deliberately, and it degrades when teams optimize one while ignoring the other.


The Model Context Window Is a Budget You Already Manage

Teams have learned to be economical with what they feed the agent, and the techniques are worth naming because they map directly onto the human side.

Scope the task with a specification. A short spec defines the boundary of the work before the agent begins. It tells the model what belongs in scope and what does not, which prevents the kind of sprawling output that no one can review.

Practice progressive disclosure. Load the focused task rather than the entire codebase. An agent that reads only the files it needs produces tighter, more relevant changes and consumes fewer tokens doing it.

Persist the rules in a harness. Constraints that live only in a chat prompt do not survive a fresh context. A project skill file or an agent configuration file keeps the rules, including the size cap, in force across sessions. This is the difference between a guideline you repeat and a constraint the system enforces.

Reset context before it rots. A long-running session accumulates noise. Starting clean, with the spec and the relevant files reintroduced, produces better output than dragging a saturated context forward.

These practices share a single logic. The model performs better when it operates against a bounded, well-defined slice of the problem rather than the whole thing at once. The same logic governs the reviewer.


The Human Context Window Is the Budget You Are Ignoring

Reviewer attention does not scale the way model throughput does. The agent can generate code all day. A senior engineer cannot review all day. When the volume of generated code rises and the review capacity stays fixed, something has to give, and what gives is review quality.

The data is not subtle. A 2025 CodeRabbit analysis of 470 open source pull requests found that AI co-authored changes carried roughly 1.7 times more issues per change than human-only changes, with critical issues up around 40 percent and major issues up around 70 percent. Salesforce reported that AI-assisted coding pushed their average pull request past one thousand lines and twenty files, that review latency climbed, and that review time on the largest changes began to plateau. A plateau in review time on a growing diff is not efficiency but the inability for reviewers to keep focus. Here are some countermeasures.

Keep pull requests small and self-contained. A SmartBear study of 2,500 pull requests linked smaller changes to fewer defects, and Swarmia found that teams holding pull requests near fifty lines shipped meaningfully more code than teams that routinely exceeded two hundred. Small is not slower. Small is faster, because a reviewer can finish reading a small diff without losing the thread.

Separate structural change from behavioral change. When a refactor and a new feature arrive in the same pull request, the reviewer cannot tell whether a behavior change was intended or fell out of the restructuring as a side effect. Keep them in separate changes, even when the agent wants to bundle them.

Slice vertically. Each change should cut through the feature so that it compiles, tests, and deploys on its own. A specification produces this naturally, because it defines the boundary and the agent splits the implementation into shippable increments.

Use feature toggles for work that is ready to merge but not ready to ship. Each increment merges to the main branch behind the toggle, and the capability activates when the pieces are in place. This keeps branches short-lived and avoids the merge conflicts that accumulate on long-running feature branches.

Fix the review service level agreement before you push smaller changes onto people. If a team takes two days to review a fifty line change, smaller pull requests will not help. The workflow assumes that review happens in hours, not days.

The Two Budgets Reinforce Each Other

The connection between the two context windows is the part most teams miss. A focused task given to the agent tends to produce a reviewable diff. A reviewable diff is, almost by definition, the output of a focused task. The spec that provides the bounds for the work for the model is the same spec for the reviewer. When leaders treat context as a single discipline applied to both machine and human, the two reinforce one another. When they optimize only for the model, they generate more code faster and quietly transfer the cost to the people who have to understand it.

Constraints and Practical Limitations

This approach is not free, and it does not fit every situation cleanly.

Some agents resist splitting. Even with an explicit instruction, certain agents will attempt to ship everything in one change. The rule often has to be repeated, or moved into a persistent configuration file so it survives a context reset rather than depending on a reviewer to catch each violation.

Some features resist clean slicing. Tightly coupled work does not always decompose into independent increments. Forcing an artificial split can confuse reviewers more than a single coherent change would. Leaders should be honest about that limit rather than treating the size cap as an absolute law.

A size cap is a guideline, not a rule. A trivial rename across fifty files can be long and still take seconds to review. The objective is reviewability, not a line count, and the two are not always the same.

The discipline is organizational, not technical. None of this works without a functioning review process behind it. A team that cannot meet its review commitments will not be rescued by smaller pull requests. The constraint is human capacity and process maturity, and no tool restores reviewer attention once it is depleted.

Engineering leaders now manage two context windows. The one belonging to the model gets the attention it does not need. The one belonging to the reviewer gets ignored, and it is far harder to replenish:

  • Treat reviewer attention as the scarcest resource in the pipeline, because it is.
  • Bound the work given to the agent with a specification and a size cap before any code is written.
  • Keep structural and behavioral changes in separate, self-contained increments.
  • Move the rules into a persistent harness so they survive a fresh context.
  • Repair the review process before asking it to absorb more, smaller changes.


The agent does not care how large the pull request is, and it never will. The discipline of context engineering has to come from the people who do.

About the author

Lucas Hendrich
CTO at Forte Group

You may also like

Thinking about your own AI, data, or software strategy?

Let's talk about where you are today and where you want to go - our experts are ready to help you move forward.