Maintenance Is a Deletion Question: Keeping an AI Harness Audit-Ready in Pharma

In December 2025, the engineering team at the software company Vercel did something that sounds like a mistake. They had spent months building an in-house AI assistant — nicknamed d0 — that let anyone on the team get answers from their company data just by asking a question in plain language: no analyst, no spreadsheet, no waiting. To make it reliable, they had given it more than fifteen specialized tools — small built-in helpers that each handled one narrow step. Then they deleted almost all of them.

A helper to find the right data table, a helper to double-check each step, a helper to format the result — gone. Instead of marching the AI through fifteen narrow steps, they gave it direct access to the data and let it work through the material itself, the way an experienced analyst would.

It didn’t get worse. It got dramatically better: 3.5× faster, its success rate rose from 80% to 100%, and it became about a third cheaper to run. Their own conclusion: “We were constraining reasoning because we didn’t trust the model to reason.” When they stopped making choices for it, the AI made better ones.

I’ve spent three articles in this series arguing that you should build structure around AI agents — Skills, harnesses, automatic checks, small specialized models. All of that still holds. But Vercel’s result points at the part nobody puts in the launch deck:

Building an AI harness is an addition problem. Maintaining one is a deletion problem.

Wait — Didn’t This Series Just Argue for Smaller Models?

If you read the previous article, a question might be nagging at you. There I argued that regulated AI usually wins with smaller, specialized models — yet Vercel reached for a very capable model and gave it more room, not less. The opposite advice?

Not really. Both are the same discipline — right-sizing — at two different layers. That article was about right-sizing the model: don’t crack a walnut with a sledgehammer. This one is about right-sizing the scaffolding around it: don’t wrap a capable agent in fifteen tools it has already outgrown. Use the least structure that still does the job reliably, and remove the rest. And because the model keeps getting better, “the least structure that still works” keeps shrinking. That is why maintenance is a deletion question.

The Instinct Is to Add. The Discipline Is to Subtract.

Think about how any agent gets built. You start with one task. It works, so you add a tool. An edge case appears, so you add a guardrail. Then a second data source, a memory file, an exception, another check. Every addition feels like progress — and the agent quietly becomes harder to trust, slower, and more work to keep checked.

This is uncomfortable in pharma specifically, because adding a safeguard feels safe and removing one feels risky. Vercel’s number says otherwise: every tool you add is also one more thing that can fail, one more thing that can drift, one more thing to verify. Past a point, safeguards don’t reduce risk. They manufacture it.

There’s a deeper reason deletion isn’t optional, and it’s genuinely new. The model underneath your harness doesn’t sit still — it keeps getting better at using tools, reasoning across steps, and knowing when not to act. So the structure you built around last year’s model can become wrong with a single update:

A guardrail that protected you from a weak model becomes a cage that holds back a capable one.
A tool that propped up a clumsy model now just confuses a stronger one.
A rigid workflow that forced structure onto an unreliable agent becomes drag once the model can handle the work itself.

We expect software to break when it gets worse. The strange part of agents is that they break when the model gets better — quietly. The agent doesn’t fail loudly; it keeps running, now either underused or overreaching, taking twenty plausible actions a human has to unwind. A harness that only ever grows is a harness rotting in slow motion.

In Pharma, Deletion Already Has a Name

Here’s the good news, and it’s the same as every article in this series: you already own the machinery for this. Your gut may treat removal as the risky option, but the way pharma works has a calm, proven answer for taking things away safely. You don’t have to invent it.

Think about how pharma already treats an important computer system. It’s never simply “done.” You confirm it works, you put it to use, and you re-check it on a fixed schedule — a periodic review — asking whether it still does its job, whether the risk has changed, and whether anything should be added, fixed, or retired. When something meaningful changes, you handle that change deliberately and re-check what it touched. No acronyms required. That habit has been aimed at facilities, equipment, and documents for decades — and almost never, so far, at an AI agent.

Looking after your AI harness	The everyday pharma habit it matches
Re-checking the harness on a schedule and after every model update	The scheduled re-check you run on important systems
A model update that might change how the agent behaves	A change you think through before you accept it
Deciding what to keep, tighten, or cut	A risk-based call — weigh what could go wrong, then act
Removing a tool or step you no longer need	Trimming the scope — a normal review outcome, not a failure
Standing the agent down for good	Retiring the system

And that risk-based call cuts both ways. A permission that was harmless for a weak model — “let it update the record,” “let it send the draft” — may be too broad for one that now acts confidently; tighten or remove it. A restriction that made sense for an unreliable model — “only summarize, never compare” — may now prevent nothing real while forcing reviewers to do by hand what the model does reliably; the reason for the rule is gone, so delete the rule. You’re not being reckless. You’re removing safeguards whose risk no longer applies — a sentence you can say to an auditor with a straight face.

What to Delete: A Review Checklist

Run this at every review. For a Medical Affairs harness — say, a medical-inquiry or PSUR-support workflow — it gets concrete fast.

Tools the model has outgrown. Did you build a “find the right SmPC section” helper the model no longer needs because it navigates the document on its own? That’s a Vercel tool. Fewer tools, fewer things that can go wrong. Cut it.
Guardrails that became cages. Is a “draft only, never compare” rule sending every comparison to a human who now just rubber-stamps it? Re-check the risk. If it’s gone, lift the restriction.
Stale sources. A superseded SmPC version, a retired SOP, an old standard-response library — a stale source is more dangerous to an agent than to you, because it doesn’t know it’s stale and keeps producing convincing work from it. Replace or remove.
Redundant checks. Is a built-in check repeating one an earlier step now does reliably? Collapse it. Two checks doing one job is two to maintain.
The whole agent. Has the model improved enough that it should be rebuilt — or the business changed enough that it should be retired? Retiring it is a legitimate, sometimes correct, outcome.

This is where the regulated world has the advantage, not the handicap. The failure mode in tech is reckless deletion: someone pulls a guardrail on a Friday, nobody writes down why, and weeks later no one can reconstruct whether it was safe. Pharma already has the third path: delete on purpose, with a reason, on the record. You write down why it’s safe, re-check what it touched, and the record shows not just what was removed but why it was safe to remove it. That’s not bureaucracy — it’s the only way to cut aggressively and stay ready for an inspection.

Getting Started

Put the harness on a review schedule — quarterly, plus a required review every time the model is upgraded. A model update is your signal to review.
Make “what can we remove?” a first-class question at each review, not an afterthought.
Write one line on the risk for every candidate deletion — keep or cut, and why.
Handle removals deliberately: record the reason and re-check the step it touched, as you would for any important system.
Track “safeguards removed” as a health metric. A harness whose tool count only ever climbs isn’t maturing. It’s quietly accumulating risk.

The Pharma Advantage, Again

The thread through this series has been the same: the disciplines pharma complains about turn out to be exactly what reliable AI requires. SOPs became Skills. Quality thinking became harness engineering. Decades of reviewed documents became training data. And now the least glamorous habit of all — the scheduled re-check, the deliberate handling of change, the willingness to retire what no longer earns its keep — is the one that keeps an AI harness alive over time.

Tech is busy rediscovering that the best agents have the fewest tools. You’ve been retiring SOPs, trimming systems you no longer need, and sunsetting products on purpose for years. You already know how to remove things safely — and that is precisely the skill the next phase of agentic AI demands.

Because maintenance was never really a question of what to add. It’s a question of what you’re finally ready to delete.

This is Part 4 of a series on AI-driven SOPs in pharma. Part 1: SOPs for AI — How Pharma’s Most Underrated Skill Became the Key to Agentic Workflows · Part 2: The AI Harness — How to Make Agentic SOPs Audit-Proof in Pharma · Part 3: Nutcrackers, Not Sledgehammers: Why Small Models Belong in Your AI Harness

Source: Andrew Qu, “We Removed 80% of Our Agent’s Tools”, Vercel Engineering Blog, December 2025.

This article was co-authored with Anthropic’s Claude Opus 4.8 model. The ideas, domain expertise, and editorial direction are mine — the AI helped structure, draft, and refine the text.