Skip to Content
Skip to Footer

Much of today’s AI conversation focuses on the far-off promise of artificial general intelligence (AGI). But some of its most transformative principles‌ — ‌like reasoning, adaptability, and autonomy‌ — ‌are already taking root inside the enterprise today. While AGI may conjure images of superintelligent machines surpassing human intelligence, businesses aren’t waiting around for this technology to arrive. They’re applying these foundational concepts now to solve real-world challenges at scale.

At Salesforce, we call these “boring breakthroughs”‌ — not because they’re unremarkable, but because they’re quietly capable, reliably scalable, and built to endure. They’re so seamless, some might take them for granted. 

These breakthroughs are ushering in an entirely new category I recently introduced called: Enterprise General Intelligence (EGI) – AI designed not for science fiction, but for the everyday realities of modern business.

The growing need for EGI in business

We define EGI as purpose-built AI agents for business optimized not just for capability, but for consistency, too. EGI interprets context, understands data relationships, aligns with operational goals, and autonomously executes workflows to deliver outcomes without human intervention.

By capability, we mean not just how well your AI understands enterprise data and context, but also its ability to handle complex tasks, reason through challenges, and adapt incrementally‌ — ‌building on what it’s already learned to take trusted actions on a user’s behalf.

Your EGI is probably ahead of the game if you’re using technologies like Data Cloud, Salesforce’s hyperscale data engine; Retrieval-Augmented Generation (RAG), which acts as an agent’s memory, transforming unstructured text into searchable formats; and the Atlas Reasoning Engine, which was incubated at Salesforce AI Research, and acts as the brain for Agentforce, the intelligent activation layer of the Salesforce Platform.

These tools give AI real-time access to business knowledge, allowing AI agents to understand nuanced relationships, reason through complex workflows, and take informed action across systems. 

But capability alone isn’t enough. Consistency earns trust. To be enterprise-ready, agents must operate reliably in complex scenarios and integrate seamlessly with existing systems. That’s why rigorous, simulated testing is essential. By stress-testing agent behavior in realistic business environments, companies can identify edge cases, refine performance, and ensure dependable operation before deployment. 

This foundation of trust transforms EGI from a promising concept into a mission-critical solution. Achieving that requires effective evaluation frameworks, enterprise-grade guardrails, and trusted toxicity detection systems. With reinforcement learning and features like Salesforce’s Trust Layer continuously monitoring and improving model behavior, EGI systems don’t just perform‌ — ‌they perform reliably in high-stakes business environments.

Trust is built with consistency, not guesswork

Such attention to detail might seem excessive — until you consider what’s at stake. For example, a sub-par restaurant suggestion or outdated stat in a school paper might be forgiven in a consumer AI use case setting. But in the enterprise, if an AI agent gets it wrong, it can result in missed revenue, broken processes, and damaged customer relationships — all of which can be catastrophic for the business. 

While many Large Language Models (LLMs) continue to break performance records on increasingly complex benchmarks, they still struggle with simpler tasks that humans ‌handle with ease. This gap in performance is what we call “jagged intelligence” — small inconsistencies in an AI agent’s ability to execute basic tasks. This type of inconsistency can undermine trust and highlight the disconnect between raw intelligence and reliable outcomes for businesses. 

Again, that’s why rigorous testing isn’t optional within an EGI world. No airline would launch an aircraft without stress-testing it for extreme conditions. No global bank would let an agent approve high-risk transactions without first testing them across hundreds of regulatory scenarios. And no healthcare network would let an agent summarize patient notes without first verifying it can interpret clinical shorthand, regional terminology, and cross-specialty nuances. 

Put simply: businesses cannot afford to deploy AI agents without fully evaluating them under real-world complexity. Before going live, agents must be tested in simulation environments that reflect the nuances of enterprise operations. These controlled spaces push agents to their limits and surface issues ‌well before they would ever have an opportunity to affect business outcomes. When it comes to delivering trusted EGI, trial-and-error is not a strategy. It’s a risk businesses can never afford to take.

A new framework for agentic simulated environment testing 

That’s the thinking behind CRMArena, a new benchmark simulation developed by the Salesforce AI Research team to test agentic behavior in realistic CRM scenarios. This initial simulation environment example replicates tasks across three key personas: service agents, analysts, and managers. The objective is to evaluate whether current models are truly ready for enterprise use, and early results indicate they aren’t. Even with guided prompting, agents succeed less than 65% of the time at function-calling for these personas’ use cases. 

These findings underscore the need for advanced model capabilities beyond general-purpose LLMs, with systems that are purpose-built for business. They also highlight the essential role of simulated agentic testing environments in refining and validating AI agents before they engage with real customers or influence business outcomes. CRMArena provides a critical foundation that sheds light into where AI agents need improvements, before assuring they’re ready for real-world deployment. 

This represents a pivotal step toward a future of sophisticated agent testing environments, paving the way for even more advanced platforms that will drive continuous AI innovation and ensure enterprise scalability and readiness at scale.

Scaling trusted AI for enterprise-grade reliability

For CEOs, CIOs, and IT leaders looking to pioneer trusted agents at scale, it’s more important than ever to have a clear understanding of the evaluation and testing tools and benchmarks that will ensure enterprise-grade reliability. As businesses across industries move toward EGI, the need for systems that can deliver consistent performance and adapt to dynamic environments is paramount. The future of enterprise AI lies not just in their capabilities, but in their proven performance under real-world conditions and high pressure business scenarios. 

By adopting advanced frameworks and embracing emerging testing environments, businesses can confidently scale AI operations that drive innovation, enhance decision-making, and protect customer trust. The journey to EGI is here — it’s time to ensure your organization is ready to lead it.

More information:

Astro

Get the latest Salesforce News