# The Model Was Never the Problem *Published 2026-06-23* | Author: chris-hart

tl;dr The story repeats almost everywhere right now: an agent demos beautifully, a budget gets approved, and six weeks later it is a tab nobody opens. The reflex is to blame the model, but the model was almost never the problem. The hard part was never getting the agent to think. It is getting the agent a job, a loop, and permission to act on a real customer.

There's a now familiar story playing out at a lot of companies right now, and it goes the same way almost every time.

Someone runs a demo. The agent is genuinely impressive. It reads the data, it reasons out loud, it drafts something a person would have spent an hour on. The room nods. A budget gets approved. Six weeks later the same agent is a tab nobody opens, a Slack channel that went quiet, a line item somebody's quietly trying to figure out how to kill before the next board deck.

If you run growth or revenue, you've watched a version of this. Maybe you greenlit it. I'm not writing this to pile on, because I think the reflex everyone is reaching for (the model wasn't good enough, the tech isn't ready, we were early) is the wrong lesson. The model was almost never the problem.

What the numbers are actually measuring

You've seen the stats. I won't rebuild the slide everyone's been passing around for a quarter. The short version: MIT's study found 95% of enterprise GenAI pilots delivered no measurable P&L impact, Gartner expects over 40% of agentic projects to get scrapped by 2027, and the 86% "stuck in pilot purgatory" line has become its own little genre of LinkedIn post.

Here's the thing those numbers don't say, even though everyone quotes them like they do. They're not a measurement of how smart the models are. The models in those failed pilots are, for the most part, the same models running in the pilots that worked. What's really being measured is something more subtle and less flattering: whether the company that bought the agent could actually put it to work.

The same stats in the reporting I mentioned above, if you read past the headline number, point right at it. The most common reason these projects die isn't a reasoning failure. It's that more than 86% of enterprises needed to upgrade the plumbing first: identity, access control, audit logging, the boring connective tissue that decides whether an agent is allowed to do anything other than talk. Gartner's own framing is that the projects get scrapped not because the models fail but because organizations can't operationalize them.

So the "agents are overhyped" read is, I think, exactly backwards. The agents mostly worked. The deployments didn't. And once you say it that plainly, the interesting question stops being is the AI good enough and becomes what does an agent actually need to make it out of the demo and into the business. From where I sit, in go-to-market specifically, it comes down to three things, and not one of them is intelligence.

One: it needs a job

The pilots that die tend to start with an exciting sentence: imagine an agent that could do anything. That sentence is a trap.

An agent that can do anything has, in practice, been given nothing. No owner, no number it's responsible for, no definition of done. It becomes a very expensive intern who's enthusiastic about everything and accountable for nothing. When the quarter ends and someone asks what the agent actually impacted, there's no answer, because it was never pointed at a job small enough to have a result.

The agents that survive own one job end to end. Not just a vague direction to help with sales. A real and definable job: turn the anonymous intent already crossing your product and your site into pipeline, this week, attributable, so the rep knows who to call and why. That's narrow enough that you can tell in thirty days whether it's working. Narrow enough that someone on the team owns it. Narrow enough to actually finish.

I'd go further. In go-to-market, do anything agents haven't just underperformed, they've done damage. The clearest example is the first wave of fully automated outbound. Everyone pointed the same models at the same tired playbook, and buyers learned the shape of an AI-written email in about a week. The outbound AI SDR has been described, fairly, as an unmitigated disaster: generic messaging at volume, degraded deliverability, prospects trained to ignore you. That wasn't a model problem either. It was a job problem. The agent was told to send more, which is not a job, it's a way to burn your domain reputation efficiently.

Two: it needs a loop, not a dashboard

Here's the failure mode I find most expensive, because it hides as success.

A lot of go-to-market tooling, the agentic kind included, stops at insight. It tells you who the visitor probably was. It surfaces an account that's showing intent. It lights up a dashboard. And then it hands the actual work back to you, which means back to a stack of seven tools and a Zapier flow your team rebuilds every quarter and stops trusting by the second month.

That gap, between knowing and doing, is where most of the value leaks out. Knowing a buyer is in-market is worth almost nothing if the response shows up days later after three handoffs. The old lead-response research everyone in this field has read makes the point: speed to action is most of the game, and getting a response out tomorrow is usually the same as losing the deal.

So the agents that reach production don't stop at the insight. They close the loop. See the signal, score it against who you actually sell to, and run the play in the same minute, sometimes routing to a human, sometimes acting inside a workflow you approved ahead of time. The output isn't a probability score for someone to interpret later. It's a person and a play, already in motion. That's a different category from a dashboard, and it's the difference between an agent that informs the work and an agent that does it.

This is also why so many insight products churn. They were never wired to the action, so the customer had to supply the action, and the customer was busy. An agent that only watches is just another thing to watch.

Three: it needs permission to act

This is the one that gets waved off as compliance, filed under legal will handle it, and it's the one that actually decides whether an agent ships.

The moment an agent stops drafting suggestions and starts doing things with real buyers (sending the message, enrolling the sequence, changing the record, personalizing what a stranger sees), the dynamics change. A chat window that says something wrong is embarrassing. An agent that takes action on the wrong prospect, in the wrong region, against a policy nobody can see, is a different kind of exposure, and everybody senior in the building knows it. That's why pilots that touch customers so often stall right before launch. Not because the model misbehaved, but because nobody could answer the questions about what exactly the agent will do, to whom, and how to stop it if things go sideways.

The reflex is to treat that question as a tax. Slow the agent down, add a review step, make it ask permission for everything, and watch the speed you bought evaporate. I think that's the wrong instinct. The teams getting agents into production are the ones treating the answer to that question as a feature. The audit trail isn't paperwork bolted on at the end. It's the work record: every action the agent took, the signals that triggered it, the person who approved the play, the policy that governed it, and a single switch to pause it by segment or by region without taking down the rest. Build that in and the agent can move fast, because moving fast is no longer the scary part. You can see what it's doing, so you can let it do more.

This isn't a hypothetical concern you can defer, either. McKinsey reports only about 1 in 3 enterprises are governance-ready for the autonomous agents they're already running, which is another way of saying two thirds are about to hit this wall whether or not it's in the demo. And the regulatory ground is moving under everyone's feet. The EU just reached a provisional deal to push its high-risk (Annex III) obligations from August 2026 out to December 2027, which sounds like relief, but mostly just means the timeline you were planning around moved again. Building the work record in from the start is how you stop caring which way the date slides next.

To be clear, this is one leg of three, not the whole story. I've watched the compliance-first pitch land flat plenty of times, because governance on its own doesn't convert anyone. It's the thing that lets the other two work. A job gives the agent something to finish. A loop lets it finish in time to matter. Permission to act is what lets it finish for a real customer without the whole company holding its breath.

The go-to-market version of this

Put those three together and you can predict, pretty reliably, which agents survive contact with a real revenue team.

THREE GATES TO PRODUCTION. What gets an agent into production. A pilot (demos well) -> A job (one owned outcome, narrow enough to finish) -> A loop (signal to action, not a dashboard) -> Permission (act on a real buyer, with the work record) Miss any one and the agent stays a demo. Clear all three and it reaches production.

The ones that die are horizontal, they produce insight and hand back the work, and they act on customers through plumbing nobody can see into. The ones that live own a narrow revenue job, run a closed loop from signal to action, and act on rails the operator defined and can pause. Which is, not coincidentally, the opposite of the spray-and-pray outbound that gave the whole category a bad year. The market already corrected on the most egregious version of this: the money and the real deployments have moved toward working high-intent demand with judgment in the loop, and away from volume for its own sake. Buyers got smart faster than the tools did.

None of this requires a smarter model. It requires deciding what the agent is for, wiring it all the way through to the action, and giving it a place to act from where everyone can see what it did. That's a product and an operating decision, not a research breakthrough. Which is good news, because it means the production gap is something you can actually close, not something you have to wait for the next model release to fix.

Where we're placing our bet

I'll be straight about the lens we're looking through. At Civic we build agents, and we've spent this stretch completely narrowing our focus, away from the do-anything posture and toward exactly the shape I've described here: an agent that owns one go-to-market job, runs it as a closed loop from signal to action, and carries the work record that lets it act on real buyers without anyone flinching. We're not ready to put a name and a date on it in this post. But the thesis isn't a secret, and it's the same one I'd give you whether or not we were building it: the next year of go-to-market won't be won by whoever has the cleverest agent. It'll be won by whoever gets a useful one into production and keeps it there.

The companies still treating this as a model problem will spend another year running demos that wow the room and die in six weeks. The ones treating it as a job-loop-and-trust problem will start shipping at scale. I know which side I'd rather be on.

If you're a growth leader staring at a pilot that demoed beautifully and went dark, or weighing whether to start one at all, I'd genuinely like to hear how it's going. No pitch. We're learning in public on this, and the operators living it have taught me more than any report has. You can reach the team through civic.com.

Sources and further reading

Source: https://www.civic.com/field-notes/the-model-was-never-the-problem