Still Running Agentic Pilots? Here’s What 5 Companies Did to Ship AI Agents to Production

How five companies took their Agentforce deployment from pilot to production.

Five companies. Five decisions. One pattern separating the teams that shipped AI agents from the ones still refining their demos.

Dmitry Sheynin

June 5, 2026 7 min read

Most AI agent pilots don’t fail. They just don’t end. The demo works. The agent handles the scripted queries, hits its marks, impresses the steering committee. The team gets another sprint. Then another. Someone on the data team flags a gap in the knowledge base. QA finds an edge case no one anticipated. Legal wants one more review of the escalation language. The agent is almost ready. It’s been almost ready for three months.

Meanwhile, other teams have already shipped. It’s not because they had better technology. Rather, they had a clearer picture of what “done” looked like before they started building. They decided what the agent needed to finish, not just answer. They set quality bars before they wrote agent logic. They asked whether the data was ready before the agent was.

The difference, in every case, came down to a decision made early. Here’s how five companies took their Agentforce deployment from pilot to production.

1. The metric you set for your agent shapes the agent you build.

Florida Prepaid manages long-term college savings accounts that often span decades. Every service interaction carries real weight. When they decided to bring Agentforce Voice into their call center, chief information technology officer Ashley Falls set a strategic direction that shaped everything that followed: “We did not approach Agentforce Voice as a call-deflection tool. We approached it as a way to redesign the service model responsibly, using AI for routine questions, preserving human capacity for higher-value conversations and building the guardrails needed to scale with trust.”

That framing had real implications. A deflection goal would have pushed them toward handling as many calls as possible, as fast as possible. A service redesign goal pushed them toward defining which calls the agent should handle and which calls needed a live representative. They scoped the agent to public Salesforce Knowledge articles only and built explicit escalation paths for anything sensitive. Cancellations trigger a transfer to a representative, with an AI-generated summary handed off for context.

The results don’t look like traditional deflection numbers. AI-handled calls average about 1.6 minutes, while human-handled calls average 10.9 minutes. That gap is intentional, reflecting the fact that reps are now fielding the calls that genuinely need them. About 35% of the chat team was redeployed into proactive, revenue-generating roles. Agentforce Voice now handles 75% of business-hours calls and 100% of after-hours calls. Before launch, after-hours callers had no clear path forward other than calling back during business hours.

What got them to production: The goal you set determines the guardrails you build, the scope you choose, and the calls the agent can handle on its own. Florida Prepaid decided what the agent was for before they decided what it could do. Most teams do it the other way around.

2. Build the quality gate before you build the agent.

Adecco staffs two million associates daily and partners with more than 100,000 clients worldwide. Their recruiters were only reaching about 10% of candidates, because roughly two-thirds of candidate interactions were happening outside of business hours when no one was available to respond. The math was simple: if you want to prescreen every candidate, you need an agent that works when your recruiters don’t.

But that agent is only as accurate as the job description it’s working from. Before Agentforce screens a single candidate, every posting gets a completeness rating and discrimination check. Postings that don’t clear the bar go back to the recruiter for cleanup first. Without an accurate and complete record, the agent can’t ask the right questions, evaluate the answers, or hand off anything useful to the recruiter.

With Agentforce, Adecco’s prescreening coverage went from 10% to 100%. “Agentforce has completely changed the way our recruiters connect with top talent around the world,” says Niki Turner-Harding, SVP and country head at Adecco UK and Ireland.

What got them to production: Adecco didn’t ask whether their agent could handle a candidate conversation. They asked whether the data feeding that conversation was ready. The quality check was a prerequisite, not an afterthought. That’s what made 100% coverage possible at scale.

3. Your agent needs conditional logic for the things that can’t go wrong.

Grant Roberson is the sole Agentforce admin at Datasite, an M&A platform where accuracy is non-negotiable. A wrong answer in a deal workflow doesn’t just frustrate users, it can derail a transaction. Roberson was an early Agentforce adopter and had a front-row seat to the platform’s evolution. In the early days, when everything ran purely on natural language instructions, agents wouldn’t always produce a clean answer. Behavior could be hard to predict and even harder to fix.

The problem was variance. Roberson’s fix was to find the decisions that couldn’t tolerate that ambiguity and replace them with conditional logic — if this is true, run this, no interpretation required. Using Agent Script, he pinned those paths to code that executes the same way every time. He applied the same thinking to escalation: before the agent could route a conversation to a human, the customer had to pose a formal question. Without that gate, users would immediately request a human, skipping the agent before it had a chance to help.

The results were immediate. Conversation failure rate dropped from roughly 33% to about 0.5%. Deflection climbed from the low 60s to a sustained average of 82%, running between 81% and 84% week to week. CSAT held at 4.8 out of 5, matching the performance of the live human support team. “It’s probably one of the bigger joys,” Roberson says, “to know that what you think is going to happen actually happened.”

What got them to production: Natural language instructions are fast to write and flexible by design. The flexibility is also the risk. Roberson didn’t replace all his instructions — he replaced the ones where unpredictable behavior had real consequences. That distinction is the whole game.

4. Your first agent is the hardest. Build it so the next one takes half as long.

Indeed runs Agentforce across four production use cases: a service agent for employer troubleshooting on the web portal, an IT help agent in Slack, an SDR agent for outbound lead qualification and meeting scheduling, and a dual-purpose sales and service agent. Building their latest agent took a fraction of the time the first one did because the infrastructure was designed to be reusable.

Instead of configuring agents through a browser UI, Indeed’s engineers build through APIs and a CLI. Agent Script is authored in code, versioned like code and deployed through the same API-driven workflows as everything else. Every agent wires into the same pre-processed Data 360 layer, where employer profiles, account status, flagged job metadata, and support history are already structured and enriched before the agent touches them. When a new use case comes up, the foundation is already there.

“The first agent took months,” said Oliver Bodden, Indeed’s senior technical product manager. “The latest agent took weeks. That’s the compounding effect.”

What got them to production: The first agent is always the hardest because you’re building the infrastructure and the agent at the same time. Indeed treated that infrastructure as an investment, not overhead. Every agent they’ve shipped since has gotten to production faster because of it.

5. Your call center reps will break your agent faster than your QA team.

SharkNinja launches about 25 new products a year. Their agents need to handle setup questions, troubleshoot problems, and field follow-ups from customers who’ve just opened the box. The team is diligent about QA testing, but they quickly found that real customers have a tendency to ask things the product team never anticipated.

Their fix was to stop treating testing as a pre-launch phase and start treating it as an ongoing operation. Call center reps now spend a few minutes each day in what SharkNinja calls “attack the bot” sessions, asking the toughest questions, attempting to trip it up, and testing edge cases no QA script would catch. The company also runs weekly conversation engineering meetings and structured learning sessions after every major iteration. The Digital Concierge has now handled more than 250,000 conversations.

“The ‘attack the bot’ sessions with call center reps surfaced more edge cases in two weeks than our QA scripts caught in two months,” said Carolin Duerkop, SharkNinja’s technology transformation partner.

What got them to production: QA scripts test what you anticipated. Frontline reps find what you didn’t. The people closest to the customer are your best stress test.

The gap between pilot and production isn’t usually a technology problem. It’s a decision problem. Most of the choices that get an agent to production can be made before you write a line of agent logic. The companies that ship make them early. The ones still in pilot are usually still deferring them.

To learn more about Agentforce best practices, visit here.

Dmitry Sheynin Senior Manager, Product Marketing

Dmitry Sheynin is a senior product marketing manager for Agentforce, focused on enterprise AI and autonomous agents. He's obsessed with customer success and loves to dive into the weeds.

More by Dmitry