

Most conversations about AI in engineering start with code generation.
Fair enough. Copilots and coding assistants have become normal in many teams. But when we started experimenting with AI internally, coding turned out to be one of many interesting places to look.
Opportunities also showed up earlier in the process.
Requirements. User stories. Test cases. Documentation. Release notes.
All the things that surround development.
A colleague recently presented an experiment where AI agents help drive these tasks across the entire development lifecycle. The goal was simple: see how far we could push automation before human review becomes the limiting factor.
The results were interesting — and so were the lessons learned along the way.
Anyone who has worked with agile teams knows the routine.
A feature idea appears. Someone turns it into a user story. Someone else breaks it down into tasks. QA prepares test cases. Documentation gets updated later — sometimes much later.
None of this feels optional. The process exists for good reasons.
Still, a lot of time goes into repetitive formatting and rewriting. That work adds up, and it's exactly the kind of work AI is well-suited to assist with — not replace.
Rather than asking one AI model to do everything, the team broke the workflow into smaller pieces. Each stage of the lifecycle gets its own agent, following instruction files written in markdown that describe how the agent behaves, what context it reads, and what output format to produce.
The pipeline looks roughly like this:
Feature Idea → Feature Agent → User Story Agent → Jira Agent → Technical Analysis Agent → Test Case Agent → Documentation Agent → Release Notes Agent
The agents connect to tools the team already uses — Jira, Confluence, Git repositories, and test management systems. Once the system has access to those sources, it can read existing tickets, scan documentation, and create draft records automatically.
The experiment begins with a short, intentionally vague prompt — for example, "We need a dashboard that shows project metrics and allows export." The feature agent produces structured alternatives with trade-off explanations. A product owner selects a direction, and the next agent takes over.
The story agent converts the selected feature into a structured user story with acceptance criteria, edge cases, and non-functional requirements. Importantly, it also checks Jira for similar existing tickets before generating anything new — a small but practical detail that prevents duplicate stories from cluttering large backlogs.
From there, the Jira agent handles the mechanical work: creating the epic, attaching stories, and populating required fields. The technical analysis agent reviews the story, scans the repository if access is available, and produces a short note covering implementation paths, dependencies, and a rough story-point estimate. In internal tests, those estimates aligned with the team's own estimates roughly nine times out of ten — useful as a second perspective during backlog refinement, not as a replacement for design discussions.
Testing was one of the more discussed areas of the experiment — and one of the most important to frame correctly.
The test case agent reads acceptance criteria and generates an initial set of test cases. Engineers found it useful specifically because it surfaced edge cases they hadn't considered yet. That's real value.
But it's worth being direct about what "generated test cases" means in practice. What the agent produces are draft test case outlines — descriptions of scenarios and expected results. They are a starting point, not executable test coverage. A QA engineer still needs to review, refine, and in many cases substantially rewrite them. The agent accelerates the first pass; it does not complete the work.
Documentation usually falls behind development — everyone intends to update it, but deadlines tend to win. The documentation agent reads new user stories, compares them against existing Confluence pages, and proposes updates. With one confirmation, new content appears in Confluence. It reduces friction; it doesn't eliminate the need for a human to verify accuracy.
Release notes follow a similar pattern. The agent scans Jira tickets associated with a release version and composes a draft summary. It also flags tickets that remain unfinished — a simple but genuinely useful warning that helps teams catch problems earlier. As with everything else in the pipeline, the output is a draft that requires human review before it goes anywhere.
Precise numbers depend heavily on the project, the quality of existing documentation, and how well the instruction files have been tuned. The estimates below reflect time to produce a reviewable first draft — not finished, production-ready output.
Manual workflow:
Task Typical Time
User story creation 20–30 minutes
Test case writing 30–60 minutes
Documentation update ~20 minutes
Release notes ~15 minutes
AI-assisted workflow:
Task Time
User story draft 3–5 min (including review)
Test cases 10–15 min (including review)
Documentation summary ~10 min (including review)
Release notes ~5 min (including review)
The mechanical work decreases significantly. The human judgment required to validate, refine, and approve the output does not disappear — it shifts.
After several weeks of experimentation, a few honest lessons emerged.
Context is everything. The system performs far better when it has access to well-maintained documentation and source code. If your existing docs are inconsistent or outdated, the agents will reflect that.
Instruction files take real investment to tune. The first versions rarely behave the way teams expect. Getting them to a stable, predictable state takes iteration — often more than teams anticipate going in. This is not a one-time setup cost; it requires ongoing maintenance as processes and codebases evolve.
Integration and governance add overhead. Connecting agents to Jira, Confluence, and Git repositories involves more than API credentials. Access controls, data sensitivity, output review workflows, and change management all need to be accounted for. Teams that underestimate this phase tend to stall before reaching production-grade behavior.
Human review is not optional. AI produces drafts. Every output in this pipeline requires a qualified human to assess before it becomes part of the record. The goal is to reduce the time humans spend on mechanical work — not to remove humans from the loop.
One engineer summarized it well:
"AI won't decide what product to build. Humans still own that. What AI does well is filling the gaps around the work."
Product managers still define the feature. Developers still make the architecture decisions. QA engineers still own test coverage. What changes is how much time those people spend on formatting, reformatting, and administrative scaffolding before they get to the work that actually requires their judgment.
Most headlines focus on AI writing code. That part matters. But the layers of work surrounding development — requirements, backlogs, tests, documentation, releases — consume significant time and rarely get the same attention.
The experiment described here suggests that AI can meaningfully reduce the friction in those areas. Not by automating them away, but by handling the first draft so that skilled people can spend their time on review, refinement, and decisions rather than on mechanical setup.
The teams that figure out how to integrate this well — with the right governance, the right instruction design, and honest expectations about what the output requires — will have a real advantage. Getting there takes more work than a demo suggests. But the direction is worth pursuing.