Agents

From Hype to Trust: The Role of Agentic Simulation Testing in Advancing the Future of Enterprise General Intelligence

Silvio Savarese Executive Vice President and Chief Scientist, Salesforce AI Research

May 1, 2025 5 min read

Key Takeaways

Enterprise General Intelligence is defined as purpose-built AI agents optimized for capability and consistency.
To be trustworthy, they must operate reliably in realistic business environments or organizations could face consequences like revenue loss or customer relationship damage.
Rigorous testing in controlled spaces (for example CRMArena, developed by the Salesforce AI Research team) is therefore essential if an organization is going to deploy agents that take autonomous action.

Much of today’s AI conversation focuses on the far-off promise of artificial general intelligence (AGI). But some of its most transformative principles‌ — ‌like reasoning, adaptability, and autonomy‌ — ‌are already taking root inside the enterprise today. While AGI may conjure images of superintelligent machines surpassing human intelligence, businesses aren’t waiting around for this technology to arrive. They’re applying these foundational concepts now to solve real-world challenges at scale.

At Salesforce, we call these “boring breakthroughs”‌ — not because they’re unremarkable, but because they’re quietly capable, reliably scalable, and built to endure. They’re so seamless, some might take them for granted.

These breakthroughs are ushering in an entirely new category I recently introduced called: Enterprise General Intelligence (EGI) – AI designed not for science fiction, but for the everyday realities of modern business.

The growing need for EGI in business

We define EGI as purpose-built AI agents for business optimized not just for capability, but for consistency, too. EGI interprets context, understands data relationships, aligns with operational goals, and autonomously executes workflows to deliver outcomes without human intervention.

By capability, we mean not just how well your AI understands enterprise data and context, but also its ability to handle complex tasks, reason through challenges, and adapt incrementally‌ — ‌building on what it’s already learned to take trusted actions on a user’s behalf.

Your EGI is probably ahead of the game if you’re using technologies like Data Cloud, Salesforce’s hyperscale data engine; Retrieval-Augmented Generation (RAG), which acts as an agent’s memory, transforming unstructured text into searchable formats; and the Atlas Reasoning Engine, which was incubated at Salesforce AI Research, and acts as the brain for Agentforce, the intelligent activation layer of the Salesforce Platform.

These tools give AI real-time access to business knowledge, allowing AI agents to understand nuanced relationships, reason through complex workflows, and take informed action across systems.

But capability alone isn’t enough. Consistency earns trust. To be enterprise-ready, agents must operate reliably in complex scenarios and integrate seamlessly with existing systems. That’s why rigorous, simulated testing is essential. By stress-testing agent behavior in realistic business environments, companies can identify edge cases, refine performance, and ensure dependable operation before deployment.

This foundation of trust transforms EGI from a promising concept into a mission-critical solution. Achieving that requires effective evaluation frameworks, enterprise-grade guardrails, and trusted toxicity detection systems. With reinforcement learning and features like Salesforce’s Trust Layer continuously monitoring and improving model behavior, EGI systems don’t just perform‌ — ‌they perform reliably in high-stakes business environments.

Trust is built with consistency, not guesswork

Such attention to detail might seem excessive — until you consider what’s at stake. For example, a sub-par restaurant suggestion or outdated stat in a school paper might be forgiven in a consumer AI use case setting. But in the enterprise, if an AI agent gets it wrong, it can result in missed revenue, broken processes, and damaged customer relationships — all of which can be catastrophic for the business.

While many Large Language Models (LLMs) continue to break performance records on increasingly complex benchmarks, they still struggle with simpler tasks that humans ‌handle with ease. This gap in performance is what we call “jagged intelligence” — small inconsistencies in an AI agent’s ability to execute basic tasks. This type of inconsistency can undermine trust and highlight the disconnect between raw intelligence and reliable outcomes for businesses.

Again, that’s why rigorous testing isn’t optional within an EGI world. No airline would launch an aircraft without stress-testing it for extreme conditions. No global bank would let an agent approve high-risk transactions without first testing them across hundreds of regulatory scenarios. And no healthcare network would let an agent summarize patient notes without first verifying it can interpret clinical shorthand, regional terminology, and cross-specialty nuances.

Put simply: businesses cannot afford to deploy AI agents without fully evaluating them under real-world complexity. Before going live, agents must be tested in simulation environments that reflect the nuances of enterprise operations. These controlled spaces push agents to their limits and surface issues ‌well before they would ever have an opportunity to affect business outcomes. When it comes to delivering trusted EGI, trial-and-error is not a strategy. It’s a risk businesses can never afford to take.

A new framework for agentic simulated environment testing

That’s the thinking behind CRMArena, a new benchmark simulation developed by the Salesforce AI Research team to test agentic behavior in realistic CRM scenarios. This initial simulation environment example replicates tasks across three key personas: service agents, analysts, and managers. The objective is to evaluate whether current models are truly ready for enterprise use, and early results indicate they aren’t. Even with guided prompting, agents succeed less than 65% of the time at function-calling for these personas’ use cases.

These findings underscore the need for advanced model capabilities beyond general-purpose LLMs, with systems that are purpose-built for business. They also highlight the essential role of simulated agentic testing environments in refining and validating AI agents before they engage with real customers or influence business outcomes. CRMArena provides a critical foundation that sheds light into where AI agents need improvements, before assuring they’re ready for real-world deployment.

This represents a pivotal step toward a future of sophisticated agent testing environments, paving the way for even more advanced platforms that will drive continuous AI innovation and ensure enterprise scalability and readiness at scale.

Scaling trusted AI for enterprise-grade reliability

For CEOs, CIOs, and IT leaders looking to pioneer trusted agents at scale, it’s more important than ever to have a clear understanding of the evaluation and testing tools and benchmarks that will ensure enterprise-grade reliability. As businesses across industries move toward EGI, the need for systems that can deliver consistent performance and adapt to dynamic environments is paramount. The future of enterprise AI lies not just in their capabilities, but in their proven performance under real-world conditions and high pressure business scenarios.

By adopting advanced frameworks and embracing emerging testing environments, businesses can confidently scale AI operations that drive innovation, enhance decision-making, and protect customer trust. The journey to EGI is here — it’s time to ensure your organization is ready to lead it.

More information:

Read up on recent advancements from Salesforce AI Research
Learn more about Salesforce AI Research

Salesforce Announces the General Availability of Slackbot – Your Personal Agent for Work

8 min read

Meet Slackbot, Your Personal Agent for Work

8 min read

Build and customize autonomous AI agents to support your employees and customers 24/7.

Discover Agentforce

Salesforce Announces the General Availability of Slackbot – Your Personal Agent for Work

8 min read

Meet Slackbot, Your Personal Agent for Work

8 min read

Build and customize autonomous AI agents to support your employees and customers 24/7.

Discover Agentforce

Silvio Savarese Executive Vice President and Chief Scientist, Salesforce AI Research

Silvio Savarese is the Executive Vice President and Chief Scientist of Salesforce AI Research, as well as an Adjunct Faculty of Computer Science at Stanford University, where he served as an Associate Professor with tenure until winter 2021. At Salesforce, he shape the scientific direction and long-term AI strategy by aligning research and innovation efforts with Salesforce’s mission and objectives. He also lead the AI Research organization, including AI for C360 and CRM, AI for Trust, AI for developer productivity, and operational efficiency.

More by Silvio

From Hype to Trust: The Role of Agentic Simulation Testing in Advancing the Future of Enterprise General Intelligence

Key Takeaways

The growing need for EGI in business

Trust is built with consistency, not guesswork

A new framework for agentic simulated environment testing

Scaling trusted AI for enterprise-grade reliability

More information:

Just For You

Salesforce Announces the General Availability of Slackbot – Your Personal Agent for Work

Meet Slackbot, Your Personal Agent for Work

Build and customize autonomous AI agents to support your employees and customers 24/7.

Just For You

Salesforce Announces the General Availability of Slackbot – Your Personal Agent for Work

Meet Slackbot, Your Personal Agent for Work

Build and customize autonomous AI agents to support your employees and customers 24/7.

Related Articles

Spring ’26 Release: 10 Tools to Help Build an Agentic Enterprise — Plus a Few Extras

76% of Workers Say Their Favorite GenAI Tools Lack Business Context, Limiting Benefits

Holiday Season Rakes in Record $1.29T for Retailers, Salesforce Data Shows

Salesforce Launches AI Fluency Playbook to Prepare Workers for the Agentic Enterprise

Key Takeaways

The growing need for EGI in business

Trust is built with consistency, not guesswork

A new framework for agentic simulated environment testing

Scaling trusted AI for enterprise-grade reliability

More information:

Share article

Build and customize autonomous AI agents to support your employees and customers 24/7.

Share article

Build and customize autonomous AI agents to support your employees and customers 24/7.

Explore related content by topic