Salesforce on synthetic data’s indispensable role in enterprise AI

Key Takeaways

Synthetic data is vital for training enterprise AI agents, as generic LLMs lack the business context and proprietary data needed for complex enterprise environments.
By simulating real-world scenarios, synthetic data enables AI agents to understand business context, handle complex queries, adhere to rules, and scale effectively for accurate performance.
Salesforce is uniquely positioned to provide synthetic data solutions due to its deep understanding of business processes and structured CRM data, allowing for intelligently structured and contextually relevant data creation.

AI agents powered by generic LLMs are typically trained on massive amounts of public data. This gives them broad, general knowledge, not the kind of sharp, contextual intelligence needed to navigate the complexity of enterprise environments. In the enterprise, AI’s intelligence is jagged, and critical information is scattered across structured systems, proprietary formats, and sensitive data sets that generic LLMs aren’t built to handle at scale.

Generic LLMs have limited knowledge of business data and metadata, as much of this information is not publicly available (mostly private property) on the internet. However, such domain-specific knowledge and tools are crucial for task success, so enterprise AI agents need to understand, reason over, and operate within complex, domain-rich environments.

Synthetic data, artificially generated to mirror the structure, nuance, and sensitivity of real enterprise data without exposing any of it, isn’t just a nice-to-have; it’s essential to unlocking enterprise-grade AI.

That’s why synthetic data, artificially generated to mirror the structure, nuance, and sensitivity of real enterprise data without exposing any of it, isn’t just a nice-to-have; it’s essential to unlocking enterprise-grade AI. By replicating the statistical properties of actual business data, synthetic data enables safe, controlled, and privacy-compliant training and testing of AI agents. It bridges the gap between general intelligence and enterprise context, allowing agents to perform with fluency, accuracy, and accountability in complex environments. And because it mimics real-world conditions without the risk, companies can move faster and more safely than ever before.

Training AI agents: Beyond general knowledge

Just as a human professional gains expertise through real-world experience and specialized training, enterprise AI agents need to be immersed in realistic business scenarios. Imagine a financial advisor who has only read about finance but never advised a client. Their knowledge, while theoretically broad, would lack the practical application and nuanced understanding required for effective performance. Similarly, an AI agent trained solely on general internet data would struggle to navigate the complexities of a customer relationship management (CRM) system or a supply chain.

This is precisely where synthetic data shines. By populating a simulated environment with thousands (or even millions) of synthetic records, including accounts, leads, opportunities, and even simulated multi-turn customer conversations, AI agents can be trained to:

Understand business context: Learn specific terminology, workflows, and relationships within a particular industry or company.
Handle complex queries: Practice responding to intricate customer requests that require back-and-forth interactions and deep data retrieval.
Adhere to business rules: Be trained to respect validation rules, data hierarchies, and operational protocols unique to an organization.
Scale effectively: Be tested against datasets that accurately reflect the sheer volume and complexity of a large enterprise, preventing performance degradation in production environments.

Benchmarking and optimization: Measuring success and building trust

Beyond initial training, synthetic data is crucial for benchmarking and optimizing AI agents. In a synthetic environment, companies can precisely measure an agent’s performance on various tasks, identifying areas of strengths and weaknesses. A new paper, CRMArena-Pro, from our AI Research team evaluated top-performing LLMs using a generic agentic framework on complex CRM tasks in a realistic environment but without context from the enterprise data and metadata. The results show that these generic LLM agents achieve only around a 58% success rate in single-turn scenarios (giving a direct answer without clarification steps), with performance significantly degrading to approximately 35% in multi-turn settings (where agents follow up with clarification questions).

By demonstrating an agent’s reliable performance in a synthetic environment, businesses can gain confidence in its capabilities before deploying it with real data.

The ability to thoroughly benchmark and validate AI agents using synthetic data is fundamental to building trust. Enterprise leaders are understandably cautious about deploying AI solutions that could potentially mishandle sensitive customer data or make faulty business decisions. By demonstrating an agent’s reliable performance in a synthetic environment, businesses can gain confidence in its capabilities before deploying it with real data. This transparency and proven reliability is essential for widespread AI adoption within an organization.

Salesforce’s unique position in the synthetic data landscape

Salesforce stands at a unique and highly advantageous position to lead the charge in providing synthetic data solutions for enterprise AI agents. This advantage stems from our unparalleled understanding of “jobs to be done” within businesses.

Deep knowledge of “jobs to be done”

For decades, Salesforce has been at the heart of how businesses operate, understanding the intricate jobs to be done across countless industries and functional areas. We understand sales processes, customer service interactions, marketing campaigns, and more; not just theoretically, but from the millions of real-world customer instances our platform facilitates daily. This deep, granular knowledge of business processes, pain points, and desired outcomes is invaluable in generating truly realistic synthetic data.

Unlike general AI companies that primarily focus on language models trained on unstructured text, Salesforce’s expertise lies in structured CRM data and the context in which it operates. We know:

The typical structure of CRM records: What fields are common, how they relate, and the expected values within them.
Common business scenarios: The types of inquiries customers make, the actions sales representatives take, and the challenges service agents face.
Industry-specific nuances: The unique data points and workflows that differentiate a healthcare provider from a financial services firm.

This intrinsic understanding allows Salesforce to generate synthetic data that is not merely random, but intelligently structured and contextually relevant, mirroring the actual complexity of an enterprise’s operations. When a business needs an AI agent to perform like a seasoned professional, Salesforce can simulate the exact environment and data it needs to learn.

Driving the future of enterprise AI

Salesforce is uniquely positioned to lead this transformation. We understand the jagged landscape of generic LLMs and the complex nature of enterprise data because we’ve spent decades embedded in it. We’re not building agents in the abstract. We’re building agents that can reason through a lead lifecycle, handle a multi-turn support case, or orchestrate a marketing journey. And we’re doing it with the only training data that truly reflects the enterprise: synthetic data rooted in real business logic, grounded in decades of domain expertise, and reinforced by trust.

In short, the path to enterprise AI runs through synthetic data. It’s how we close the gap between generic intelligence and business-ready performance, and it’s how Salesforce is making that future a reality, today.

Go deeper:

Learn more about why enterprise knowledge is the foundation of trustworthy AI
Dive deeper with a three-step framework for building trust
Learn why generic LLM agents fall short in enterprise environments

Jason Wu Director of Salesforce AI Research

Leads the synthetic enterprise data work for Salesforce AI Research

More by Jason

The Indispensable Role of Synthetic Data in Enterprise AI

Key Takeaways

Training AI agents: Beyond general knowledge

Benchmarking and optimization: Measuring success and building trust

Salesforce’s unique position in the synthetic data landscape

Deep knowledge of “jobs to be done”

Driving the future of enterprise AI

Go deeper:

Just For You

Giving AI a Memory: Salesforce and Informatica Empower Your AI Agents to Reason Accurately and Take Action You Trust

Salesforce Data: AI and Agents Propel Cyber Week to Record $336.6B in Global Spend

Unlock trapped data with Data Cloud.

Just For You

Giving AI a Memory: Salesforce and Informatica Empower Your AI Agents to Reason Accurately and Take Action You Trust

Salesforce Data: AI and Agents Propel Cyber Week to Record $336.6B in Global Spend

Unlock trapped data with Data Cloud.

Related Articles

Salesforce Data: Amid Record-Breaking Cyber Week, AI Agents Expected to Influence $73B in Sales

Salesforce Completes Acquisition of Informatica

Test, Learn, Iterate: Lessons from Five Real-World Agentforce Rollouts

Study: 84% of Technical Leaders Need Data Overhaul for AI Strategies to Succeed

Key Takeaways

Training AI agents: Beyond general knowledge

Benchmarking and optimization: Measuring success and building trust

Salesforce’s unique position in the synthetic data landscape

Deep knowledge of “jobs to be done”

Driving the future of enterprise AI

Go deeper:

Share article

Unlock trapped data with Data Cloud.

Share article

Unlock trapped data with Data Cloud.

Explore related content by topic