AI Coding Agents Need XP

7 min read

Agentic Coding Needs a Spine

29.05.2026, By Stephan Schwab

Agentic coding did not make old software habits obsolete. It made them more visible. When a machine can produce code at absurd speed, the disciplines that generations of developers learned the hard way matter even more: small steps, fast feedback, clear responsibilities, and one source of truth for each rule. The agent is new. The physics are not.

AI Coding Agents Need XP

If you do not work in software day to day, the core problem is simple enough: the machine writes code quickly, but quick is not the same thing as coherent. Plenty of software fails for the same reason a badly organized kitchen fails. Knives in one drawer, forks in another, spices everywhere except where cooking happens, and three half-open bags of rice because nobody knew there was already rice in the house. The work continues. The waste multiplies.

Software gets that same chaos when anyone, human or machine, copies the same rule into five files, invents a new class for every whim, or refactors code just because it feels intellectually itchy. The result may still run. It just becomes expensive to change.

XP Still Works Because Physics Still Works

"Faster code generation does not repeal feedback loops."

Extreme Programming was built for a world where software changes constantly and developers are wrong several times a day. That world did not go away because a large language model learned to autocomplete whole files.

The XP habits that mattered in 2000 matter even more with agents in 2026:

  • Write a failing test first.
  • Make the smallest change that turns red into green.
  • Refactor only while tests stay green.
  • Integrate constantly instead of saving surprises for later.

None of this is glamorous. Good practice rarely is. Good practice mostly looks like refusing to lie to yourself.

Agentic coding raises the stakes because the model can produce a lot of plausible nonsense before a human has even finished their coffee. That is why tests beat instructions. The habit matters because the feedback loop matters. If you remove the loop, you are left with speed and hope, which is how expensive mistakes get mass-produced.

DRY Means One Rule, One Home

"Duplication is not just ugly. It is how one bug becomes a subscription plan."

DRY gets oversimplified into a slogan about reducing keystrokes. That misses the point. The point is not typing less. The point is making sure one business rule does not quietly fork into several competing versions.

If your discount logic lives in a controller, a background job, a checkout service, and a reporting script, you do not have reuse. You have four future disagreements waiting for a release weekend.

This is exactly the kind of mess agents create when discipline is weak. The model sees similar code in two places and happily pastes a third version because that solves the immediate task. It is not malicious. It is opportunistic. Fast output without design pressure always drifts toward duplication.

That is why the old rule still matters:

  • Reuse an existing domain object or helper before creating a new one.
  • If a business rule appears twice, extract it before the task is considered done.
  • Do not copy validation, pricing, authorization, or mapping logic across modules.

That is the human job when working with an agent. Not romanticizing craftsmanship while accepting avoidable duplication. The old habit still wins: keep the rule in one place, then defend that place every time the code changes.

For readers outside day-to-day software work, DRY is just this: when a company policy changes, you want one place to update it, not seven places to forget it.

OOP Is About Responsibility, Not Inheritance Cosplay

"Good objects own behavior. Bad objects are storage bins with a fake mustache."

Object-oriented programming also gets abused by people who confuse vocabulary with design. A class hierarchy is not architecture. It is often just a longer route to regret.

Good OOP, especially in agentic coding, means a few plain things:

  • Each object or module should have one clear reason to change.
  • Behavior should live close to the data it depends on.
  • Public interfaces should stay small.
  • Composition should win over inheritance unless inheritance is obviously simpler.

That matters because agents love inventing abstractions. Give them a vague prompt and they will gladly produce BaseManagerFactoryAdapter as if satire were a design pattern.

The habit that keeps this sane is older than the current wave of tools: prefer small objects with clear responsibilities, extend an existing abstraction only when the surrounding code already uses that pattern, and do not introduce a new layer unless the current one is failing.

The practical test is brutally simple: if a human developer cannot explain where a rule lives and why, the agent will not keep it coherent either.

TDD Gives the Agent a Metronome

"Without red, green means nothing."

A lot of teams say they want the model to do TDD when what they actually mean is, “please also write some tests so this looks respectable.” That is not TDD. That is paperwork with assertions.

If you want an agent to behave like a disciplined developer, the development loop has to stay intact instead of turning tests into decorative paperwork.

The loop is old and still undefeated:

  1. Read the existing tests and code before changing anything.
  2. Add or update one failing test that describes the next behavior.
  3. Implement the smallest change that makes that test pass.
  4. Run the relevant tests.
  5. Refactor only after green.

This gives the model a rhythm. It narrows the search space. It turns the task from “write something impressive” into “satisfy this specific contract without breaking the rest.” Models need that discipline even more than humans do, because they are very good at sounding correct while being structurally wrong.

If you want the broader case against freestyle prompting, vibe coding is not software development. It is improvisation with better autocomplete.

What Good Agent Instructions Actually Say

"Tell the model how to work. Let tests and tools decide whether it succeeded."

Most bad instruction files share one flaw: they try to predict every coding decision in prose. That is unwinnable. The file swells, contradictions appear, and the agent cherry-picks whatever token pattern looks easiest in context.

The better approach is smaller and harsher. Your instruction file should mostly cover:

  • Workflow: test first, smallest change, green before refactor.
  • Design guardrails: keep code DRY, prefer small objects, reuse existing abstractions.
  • Safety rules: do not weaken tests, do not invent dependencies casually, ask when requirements are ambiguous.
  • Completion rules: tests pass, lint passes, diff stays focused.

That is enough to shape behavior without pretending markdown can replace judgment.

In other words, instruction files are for habits. Tests are for proof. Linters are for consistency. Keep each tool doing its own job.

A Better Prompting Habit for Teams and Solo Developers

"The best agent prompt is often a one-line task plus a repo that already tells the truth."

There is a tempting fantasy around agentic coding: write a clever master prompt once, then let the machine carry the craft for you. That fantasy survives because the first few demos usually work.

Then reality arrives. The model duplicates logic. It changes an interface nobody meant to touch. It adds a helper that almost matches the existing helper but not quite. Suddenly the team is writing prompt law instead of software.

The cure is not more drama in the prompt. The cure is a repo that tells the truth:

  • Tests explain what the system must do.
  • Instruction files explain how the agent should behave while changing it.
  • Existing code shows what patterns are already alive.

Once those three line up, prompting gets simpler. You can say, “Add VAT handling for Swiss invoices,” and trust the agent to enter the codebase through the front door instead of smashing a window.

That matters for solo developers too. A solo repo can still become a swamp. In fact it often becomes a swamp faster, because there is nobody else around to complain before the reeds are shoulder-high.

Hosted Example Files

The examples below are intentionally short. They do not try to serialize the entire craft of software development into markdown. They establish discipline, then hand enforcement back to tests and tooling.

Steal them. Trim them. Make them fit your stack. But keep the center of gravity where it belongs: tiny feedback loops, clear responsibilities, and less duplicated stupidity.

Contact

Let's talk about your real situation. Want to accelerate delivery, remove technical blockers, or validate whether an idea deserves more investment? I listen to your context and give 1-2 practical recommendations. No pitch, no obligation. Confidential and direct.

Need help? Practical advice, no pitch.

Let's Work Together

Newsletter: No methodology theater. No fluff.
Delivery insights and drama you won't find elsewhere.

×