Key Takeaways
- Treat AI agents like employees — give them job descriptions, KPIs, and regular performance reviews.
- Build in quality checkpoints at the riskiest workflow moments to catch errors before they compound.
- Meaningful agent improvement requires subject matter expert feedback, not just thumbs up or down from end users.
When one sales development representative at Asymbl is doing his job right, everyone on the team calls him Teddy.
When he goofs up, he becomes Theodore.
And when he really missed the mark, his colleagues at the workforce orchestration company do what any parent would do: use his full name, Theodore Frank.
But Teddy doesn’t have parents, not in the traditional sense, because he’s an AI agent. The name escalations are a good-humored part of the ongoing management process to ensure that Teddy and Asymbl’s other agents perform their duties effectively — and that they maintain their effectiveness over time.
“We like to say Teddy has a soul,” said Brandon Metcalf, CEO and Founder of Asymbl, a workforce orchestration company that uses technology to harmonize human team members and digital workers. “And the soul is the identity and the operating principles that guide him into doing his job.”
When Asymbl hit a growth spurt in 2025, a single human sales development representative (SDR) was drowning in hundreds of prospects across the company’s three business lines. Hiring more humans would have been prohibitively expensive. Teddy was the solution, built on Agentforce and plugged directly into the company’s Agentforce Sales instance so he could read lead scores, update opportunities, and send initial scheduling emails.
Asymbl currently employs about 170 humans and 200 digital workers. Those agents are treated like members of the team, digital workers with job descriptions, performance reviews, documented KPIs, and coaching from human colleagues. That close supervision is paying off. In 2025, digital labor generated $5 million in productivity impact for Asymbl, and this year the number will land between $11 million and $13 million.
At Asymbl, human sales reps sit down every week and review Teddy’s email communications line by line.
Asymbl is succeeding in an arena where many companies are falling short: not just launching agents but consistently coaching and monitoring them. Deloitte’s recent State of AI in the Enterprise report found that while 85% of companies expect to customize AI agents for their businesses, only 21% have a mature governance model for the agents they deploy. But it’s that kind of governance that fuels growth into full-fledged Agentic Enterprises.
The Management Process
To successfully manage an agent, you first have to clearly define what it’s supposed to do. Having a clear expectation of what good looks like — a job description, essentially — is what makes the crucial oversight and maintenance part of the process possible.
Teddy’s job, for instance, is to scan emails that come in from prospects, qualify them against the ideal customer profile, correctly route them to one of Asymbl’s three business lines, and convince qualified prospects to meet with a human.
Polly, the company’s people operations agent, fields questions from employees on topics ranging from benefits to bereavement. Her job is to handle those conversations the way a thoughtful HR partner would, which means recognizing what the employee is actually asking and responding with appropriate empathy before delivering information.
After an agent’s job is well defined, the next step is to schedule regular performance reviews.
At Asymbl, for instance, human sales reps sit down every week and review Teddy’s email communications line by line. Those sessions double as coaching for Teddy and as training for the humans, who pick up new messaging angles from the approaches he tries. Specifically, Teddy is coached on showing empathy, using the right tone, and knowing when to push a prospect toward a human salesperson. (Polly gets coached on two questions every week: Did she get the right answer, and did she get the tone, clarity, and judgment the moment called for?)
The same rhythm applies to other agents. Every Monday, Metcalf meets with another digital worker named Bradley, who oversees a constellation of roughly 50 agents that produces a 10- to 15-page weekly CEO briefing pulled from Salesforce, Slackbot, finance systems, and Google Drive.
Most of the time, the work from Bradley and his “team” of other digital workers is solid. But not long ago, Bradley turned in a briefing that didn’t pull new data analysis.
Agents need feedback from the humans who work with them, and thumbs-up and thumbs-down buttons won’t get the job done.
This miss surprised Metcalf. Then he realized the only way to prevent Bradley from making the same mistake again was to review his performance and let him know where he stumbled. “Performance reviews are an important part of everything we do,” he said. “That mindset is what makes us successful.”
The tools for conducting these reviews have become more sophisticated as the digital workforce has grown. Earlier this year, Salesforce released a customizable score and evaluation tool that lets companies define their own business metrics and run large language models over agent sessions to classify tone, brand adherence, and customer sentiment automatically.
Asymbl now uses those tools religiously.
When an agent misses a goal, the response plays out like a performance improvement plan. A manager analyzes the failed session, identifies poor patterns, outlines how to perform tasks better, and tweaks the Agent Script, which governs exactly how the agent reasons and acts so failure doesn’t happen again.
The Checkpoints
When Bradley turned in that insufficient briefing, Asymbl’s response was to build quality assurance steps along with human and additional digital auditors into his workflow so that data pulls, analysis, and page styling all get verified before the report lands on Metcalf’s desk.
Kathy Baxter, Principal AI Architect in Salesforce’s Office of Ethical and Humane Use, calls those moments checkpoints. Baxter says that agentic AI without checkpoints is automation without accountability. She compares well-designed agent workflows to a manufacturing line, where quality inspections happen at multiple stages rather than only at the end. She notes that checkpoints should cluster around the riskiest moments in a workflow: high-error tasks, subjective judgment calls, and irreversible actions.
She also warns that the cost of skipping checkpoints tends to show up later.
“If you don’t have enough checkpoints along the way, you’re going to end up accumulating basically invisible debt, or you could get cascading errors that only become visible at the very end when it might not be possible to repair,” Baxter explained.
She adds that measuring outcomes at various checkpoints is simply not sufficient. “An agent that achieves the right outcome with the wrong process is actually a liability,” she said.
In a customer support context, that means looking past the case resolution rate to the sources from which an agent drew, the reasoning behind its responses, and whether a closed case gets reopened later because the real issue was never addressed. Baxter argues that just as companies build compliance infrastructure to ensure humans follow the right procedures, they need to monitor whether their agents are doing the same.
“It’s a matter of running appropriate diagnostics to understand exactly where this is breaking,” Baxter said. “Is it the data? Is it the model? Is it the context?”
The ROI
Part of measuring how agents perform is measuring how they contribute to the balance sheet. Asymbl tracks digital labor as its own category on the annual profit-and-loss statement — not folded into IT spend or software-as-a-service licensing. The accounting choice reflects the management philosophy: If agents are workers, they must show up on the books as labor instead of tools.
The framing changes the questions Metcalf asks about return. The first is about hiring: Which roles can a digital worker fill? The second question pertains to productivity: How much faster, how much more effective, does a human become when paired with a digital teammate?
The numbers Asymbl reports are the result of this discipline. Teddy alone has generated an ROI of 3,789%, handling more than a thousand leads a week while the human SDR focuses on strategic, relationship-driven outreach that actually needs a person.
Interestingly, the math for calculating the ROI on digital workers unfolds differently than the math for calculating the ROI on human headcount. Metcalf said that if Asymbl decides it needs 10 Teddys, they wouldn’t build nine more of them; they would simply turn up the capacity of the Teddy that exists today. In this scenario, that means the same human manager and the same agent identity would equal 10 times the output with little to no additional investment.
“In this instance, I’d still just be managing Teddy, but Teddy would be like Super Teddy, which would be incredible and exciting and fun,” he said.
Some experts say this false equivalency creates some measurement challenges.
Ben Grant, Managing Partner at Lambton Capital Partners in the U.K., recently told CIO magazine that until companies can quantify the ROI of digital workers in the same language that they’re using to quantify the ROI of human workers, there could be a disconnect.
“Traditional ROI wants clean input-output. AI doesn’t do that yet in most businesses,” Grant was quoted as saying. “The value shows up in time reclaimed, decisions made faster, and gaps being plugged before they become problems. Try putting that in a spreadsheet.”
The Peer Review
Agents need feedback from the humans who work with them, and thumbs-up and thumbs-down buttons won’t get the job done.
“We know that thumbs-up, thumbs-down feedback from users isn’t enough,” said Nancy Xu, VP of Product Management for AI at Salesforce. “When I’m using an agent for the first time, I’m there to get work done and get answers — I’m not there to teach the agent. That’s why experts and practitioners with domain expertise are critical to the development process. They help map out and critique how an agent should be performing within your organizational practices.”
Xu offers an example of the kind of nuance the system captures. An agent wants to escalate a sensitive conversation to a human service professional. For some use cases, you want the agent to first verify with the end user that they want to be escalated before handing the conversation off. In others, it’s a direct handoff. That interaction contains the sort of business-process subtlety that is unique to a particular company. Feedback from experts inside the company helps guide the agent as it evolves.
Additionally, sycophancy is dangerous for enterprise agents. “Sycophancy is when the agent tells users what they want to hear,” she said. “This isn’t great in a B2C context, and when it comes to B2B, regulated, or sensitive scenarios, it becomes even more problematic.” Xu said the answer lies in an agent development process that aligns agent behavior with organizational practices and guardrails, even if it means agents are pushing back on users who press them to break the rules.
The Upshot
Ultimately, agents like Teddy, Bradley, and Polly will continue to evolve. By next year, Asymbl plans to roll out a digital specialist for software applications, another for consulting, plus coaching and engagement agents behind them. Each will show up with a job description, a manager, a review cadence, and a line on the profit and loss statement. Eventually each of these new agents will have a nickname, too.
These digital workers won’t always perform perfectly. The next Teddy undoubtedly will be called Theodore Frank every once in a while. But the sooner companies move beyond prompting agents and into managing them, the sooner the hybrid workforce will become a reality.






