AI Won't Refactor Your Web Component

10 min read

The Branches Never Stop Growing

19.06.2026, By Stephan Schwab

AI coding tools are excellent at continuing a pattern. They are much weaker at deciding when the pattern should end. I built a multi-mode web component with agentic coding and watched the models extend it cheerfully into a hostage situation: mode selection, prompt handling, option rendering, workflow branching, and four sets of special-case behavior all crammed into one body that used to be simple. No model stopped to say the architecture was wrong. That part still required human judgment. The tools did not fail — they did exactly what they are designed to do. The problem is that people keep expecting continuation engines to volunteer the refactor, and they don't. Design judgment is not being automated. It is being quietly handed back to whoever is paying attention.

The setup was not exotic.

No giant framework. No dependency carnival. Just web components, deliberately chosen to keep the surface area under control and avoid dragging in half the JavaScript ecosystem for a problem that did not need it.

That choice aged well. The component did not.

The more features I asked for, the more the models kept doing what these tools are very good at: extending the local pattern in front of them. New mode? Add another branch. New option? Add another conditional. New workflow? Thread another state through the same component and hope nobody notices the smell.

They were not being stupid. They were being statistically obedient.

That is the part many people still refuse to understand. Claude Code, Codex, and similar tools are strong at continuing the work. They are much weaker at deciding that the current shape of the work is wrong.

Local Success Is Not System Design

"AI tools are excellent at extending a pattern that already exists. That is not the same thing as deciding the pattern should end."

A growing component can look productive for quite a while.

Each change works locally. The demo survives. The new button appears. The new mode does the thing. The diff looks plausible. The branch count rises politely in the background.

Then one day the component is no longer a component. It is a hostage situation.

That was happening to neo-chat-box. The component had become the place where too many decisions met each other:

mode selection
prompt handling
option rendering
feature-specific UI
workflow branching
shared chat behavior
special-case behavior for image, video, and email tasks

None of those concerns is absurd by itself. Shoving all of them into the same body is how you grow a very modern monster.

The models did not protest. Why would they? The repository showed them a component that already handled multiple modes, so they kept reinforcing that shape. If you ask for image controls in a multi-mode chat box, the tool does not naturally ask whether those controls belong in a specialized child component. It usually asks how quickly it can add the next conditional.

That is continuation. Not design.

The Refactor Started With Questions

"The turning point was not a better prompt. It was asking better design questions."

The way out was not another grand specification. It was a sequence of targeted questions.

What is genuinely shared across all modes?

What changes only because the user is creating an image instead of sending a plain message?

Which pieces represent stable chat behavior, and which pieces are really mode-specific configuration UIs pretending to be chat behavior?

Where is the component branching because the business concept differs, not because the rendering detail differs?

Which decisions should be visible in the UI as explicit composition rather than hidden in conditionals?

That investigation led to the obvious answer that the models had not volunteered on their own. The generic chat shell should stay generic. The mode-specific behavior should move into specific components. The user should select mode-specific options through dedicated UI elements instead of forcing one swollen component to impersonate four products badly.

So the refactor direction became composition.

A regular chat can stay regular chat.

An image workflow can expose image-specific choices through a focused component.

A video workflow can do the same for video.

An email workflow can surface the controls and constraints that belong to email work instead of pretending those are just another branch of a generic conversation box.

Same app. Fewer lies.

This Is Why the Demos Mislead People

"Fast output creates the illusion that judgment has been automated. Usually it has just been postponed."

A lot of people now watch demo videos and conclude that human design judgment is on borrowed time.

The demos are persuasive because the tools are real. They can inspect a repository, generate code quickly, thread changes through several files, and recover from obvious mistakes. That part is no longer hypothetical.

The mistaken leap comes right after that.

People see the fluent output and infer that the machine will also recognize when the structure itself needs to change. They assume it will volunteer the uncomfortable refactor, reject the convenient branch, and simplify the design before complexity hardens into maintenance cost.

Usually it will not.

Anthropic’s own guidance on building effective agents is much more sober than the public fantasy. They argue for simple, composable patterns over unnecessary framework complexity, warn about compounding errors in autonomous agents, and say coding agents work especially well where solutions are verifiable through automated tests. They add the sentence that matters most here: human review remains crucial for ensuring solutions align with broader system requirements.

That is not a minor footnote. That is the whole argument.

Broader system requirements are exactly where the trouble lives:

which abstraction should exist
which responsibility should move
which branch is a signal that the boundary is wrong
which apparent convenience will make the next ten changes worse

Those are not autocomplete problems.

The Models Are Optimized to Continue

OpenAI’s paper on why language models hallucinate makes the adjacent point from the model side. These systems are still rewarded too often for confident guessing instead of calibrated restraint. The practical consequence in coding is familiar: if you leave the door open for a plausible continuation, the model will often choose continuation over hesitation.

That does not always show up as a fabricated API call. Sometimes it shows up as fabricated confidence in a design direction.

The component already has four modes? Fine, let us add a fifth branch.

The class already owns UI rendering, state transitions, workflow options, and mode-specific behavior? Fine, let us keep all of that in one place and make the names longer.

The tool is not evil. It is not lazy. It is just not carrying the cost of the future structure the way an experienced developer does.

If you have spent years cleaning up systems that got “just one more branch” added to them for two quarters straight, you can feel the damage early. The tool usually cannot. Or more precisely, it will not act on that feeling unless you force the conversation there.

The Data Is Not Kind to Naive Optimism

"If AI output automatically improved software, maintainability metrics would already be getting cleaner. They are not."

The belief that code generation naturally leads to cleaner systems was always lazy. External evidence has become less patient with it.

GitClear’s report Coding on Copilot: 2023 Data Suggests Downward Pressure on Code Quality analyzed roughly 153 million changed lines of code and found rising churn, more added and copy-pasted code, and less evidence of reuse. Their summary is blunt enough on its own: AI-heavy code changes increasingly resembled the work of an itinerant contributor rather than a careful long-term maintainer.

That does not mean AI coding tools are useless. It means speed is real and design discipline is still required.

The messy truth is more interesting than the hype.

These tools can help experienced developers move much faster. They can also industrialize mediocre structure at shocking speed if nobody is actively steering architecture, boundaries, and tests.

My neo-chat-box example is exactly that in miniature. The assistant was useful for implementing changes. It was not spontaneously protecting the component from accumulated design debt.

Follow the Hiring, Not the Hype

"If AI labs truly believed generated code had replaced judgment-heavy software work, they would not be staffing enterprise foundations, deployment, compliance, and business systems so aggressively."

This part is worth watching because it cuts through the marketing fog.

The labs selling coding magic are also hiring large numbers of software developers for deeply judgment-heavy business and enterprise work.

Anthropic has advertised a Software Engineer, Enterprise Foundations role for its Claude for Work initiative. The description is not about typing speed. It is about enterprise-grade features, security, compliance, scalability, identity management, governance, role-based access control, and industry-specific solutions for healthcare, finance, and education.

OpenAI’s career listings now read the same way. Their public careers page includes roles across Forward Deployed Engineering, B2B Applications, Internal Applications, Finance, and enterprise deployment work. A published Backend Software Engineer, Enterprise AI Platform role described work on secure and compliant systems, enterprise data access, authentication, reliability, and customer-managed infrastructure so that large organizations can actually run agents safely in production.

That is the tell.

The frontier labs are not staffing as if software development has collapsed into prompting. They are staffing as if the valuable work is moving into the hard middle where business constraints, security, enterprise identity, deployment reality, compliance, and product judgment all collide.

Because that is exactly what is happening.

What I Actually Needed From the AI

I did not need the model to replace judgment. I needed it to help me apply judgment faster.

That is a much saner expectation.

Once the design questions were clear, the tool became useful again:

compare current responsibilities inside the component
isolate what is truly shared
identify the seams for specialized child components
sketch alternative compositions
help move rendering logic into focused elements
keep the refactor moving without rewriting the whole app by hand

That is strong leverage.

But notice the ordering.

The leverage arrived after the design judgment, not instead of it.

The valuable human contribution was recognizing that the problem was no longer “add another feature” but “stop one component from pretending to be four different products.” Once that decision had an owner, the AI could accelerate execution.

Without that decision, it would have kept smiling and adding branches.

Better Questions for Agentic Coding

"If you want better output from coding agents, ask questions that expose structure, not just questions that demand features."

When a component starts swelling under AI-assisted development, these questions help more than another heroic prompt:

Which responsibilities are shared, and which only coexist because history shoved them together?
Which branches represent different business concepts rather than minor UI variation?
What would the smallest stable shell component look like?
Which mode-specific options deserve their own focused component?
What state belongs to the shell, and what state should move behind a specialized interface?
What future feature becomes easier if we split this now?
What test would fail if we kept the wrong boundary in place?

The last question matters because it turns design taste into evidence.

A good refactor is not just prettier. It should make later change cheaper, testing clearer, and mode-specific behavior less entangled.

The Hard Part Still Has an Owner

The most damaging AI myth in software right now is not that the models are useless.

It is that good demos and early wins prove human judgment is becoming optional.

They do not.

They prove the clerical parts of software development are getting cheaper. They prove that continuation is faster. They prove that one capable developer can now move much more quickly from idea to implementation.

They do not prove that the machine will decide when the architecture has started lying.

My web component did not need a more enthusiastic assistant. It needed someone willing to say the current shape was wrong, ask targeted questions, and choose composition over branch accumulation.

That is why Claude Code and Codex are powerful tools and still not substitutes for judgment.

They can help build the thing.

Someone still has to decide what the thing should be.

Talk It Through

Tell me what is happening. I listen, ask a few practical questions, and reflect back what I see: where the risk may sit, what may be blocking delivery, and what looks worth checking next. No pitch, no obligation. Confidential and direct.

Talk it through. Practical reflection, no pitch.

Start a Conversation

🎭 This Week's Episodes

Telenovelas show what we can't say in client meetings. The drama is heightened, but the patterns are real.

Mornings Start Before You Do

Dawn on the finca introduces Sophie to Esteban's discipline, Rosa's kitchen authority, and the measured rhythm of horses and weather. She tries to ...

The AI Extraction

Back inside on temporary access, Ethan Carter goes after the oldest part of Whitaker Payroll: the COBOL itself. With Nathan Cole beside him, he use...

Technical Consultancy

Embedded Delivery Partner

Embedded into your team as an active contributor, reducing delivery friction and helping important work move cleanly.

Technical Advisor

Peer-style technical assessments before far-reaching decisions; reduce architectural and product risk early.

Product & Delivery

Ship working software to real users earlier. Measure impact and adapt based on evidence rather than assumption.

Custom Software Development

High quality, maintainable software. Short-term augmentation that leaves lasting capability in your own team.

AI Won't Refactor Your Web Component

The Branches Never Stop Growing

Local Success Is Not System Design

The Refactor Started With Questions

This Is Why the Demos Mislead People

The Models Are Optimized to Continue

The Data Is Not Kind to Naive Optimism

Follow the Hiring, Not the Hype

What I Actually Needed From the AI

Better Questions for Agentic Coding

The Hard Part Still Has an Owner

Talk It Through

🎭 This Week's Episodes

Technical Consultancy

Embedded Delivery Partner

Technical Advisor

Product & Delivery

Custom Software Development

Recent Articles

Explore More

AI Won't Refactor Your Web Component

The Branches Never Stop Growing

Local Success Is Not System Design

The Refactor Started With Questions

This Is Why the Demos Mislead People

The Models Are Optimized to Continue

The Data Is Not Kind to Naive Optimism

Follow the Hiring, Not the Hype

What I Actually Needed From the AI

Better Questions for Agentic Coding

The Hard Part Still Has an Owner

Talk It Through

Related Articles

When Vibe Coding Meets a Real Developer

Crew Resource Management for AI Pairing

Agentic Coding Is Not 'Make Me X'

Newsletter

🎭 This Week's Episodes

Technical Consultancy

Embedded Delivery Partner

Technical Advisor

Product & Delivery

Custom Software Development

Recent Articles

Explore More