Skip to Content
0%

Catch our Customer Zero Dreamforce keynote and learn how you can become an agentic enterprise, too

Behind The Scenes: Building Agentforce for Salesforce

Illustration of code on top of a purple and magenta gradient background, with Salesforce on Salesforce logo
Agentforce has handled more than 1.5 million support requests and helped Salesforce nurture over 10,000 leads every week. [Image credit: Aleona Pollauf/Salesforce]

We sat down with Salesforce’s Director of IT for Product Management, Harini Woopalanchi, for an inside look at how internal teams at Salesforce securely build, test, and deploy AI agents.

Key Takeaways

This summary was created with AI and reviewed by an editor.

When it comes to building, testing, and deploying artificial intelligence (AI) agents, Salesforce Sandboxes are an essential tool for the Agentforce Lifecycle Management journey. This is a key focus for Harini Woopalanchi, our director of IT for product management, and her team. They work to ensure that Agentforce, the agentic layer of the Salesforce Platform, is developed and tested in the right environment, guaranteeing a smooth and reliable launch.

Agentforce powers help.salesforce.com, which has handled over 1.5 million support requests since launching last year. Agentforce also helps us nurture more than 10,000 leads per week.

In a recent conversation with Woopalanchi, we unpacked this Salesforce on Salesforce story, learning how her team used Salesforce Sandboxes as Customer Zero to develop, test, and deploy Agentforce.

Table of contents

Boldly and securely innovate

Create secure, isolated copies of your production environment for all your building, testing, and deploying needs with Salesforce Sandboxes.

Part 1: The power of sandboxes for development and testing

So exciting seeing these agents come to life! Salesforce chose to build these agents using our own products. Can you tell us why relying on Salesforce Sandboxes was a critical decision for this project?

Woopalanchi: Thank you! It’s been an incredible journey. The decision to “dogfood” our own products, especially Salesforce Sandboxes, was absolutely critical.

We were able to develop and test in an environment that was a true reflection of our production instance, but completely isolated. This meant we could iterate rapidly, experiment with new features, and push the boundaries of the agent’s capabilities without any risk to live customer operations. 

Furthermore, this isolated environment prevented the agent’s development and testing from disrupting our other application release cycles, ensuring a smooth and uninterrupted flow for all of our development projects. This gave us the confidence that what we were building would perform exactly as expected when deployed. It’s the ultimate proof point for our own technology – if it’s good enough for us, it’s certainly good enough for our customers.

How did using a full copy sandbox enable a more realistic development and testing environment compared to other sandbox types?

Woopalanchi: A full copy sandbox was a game-changer for testing these agents. Unlike partial or developer sandboxes, a full copy provides a complete replica of our production data, including all our customer records, historical transactions, and, crucially, our entire knowledge base. For an agent designed to answer complex questions like transaction lookups or guide users through detailed troubleshooting, having real-world data was indispensable.

Initially, we conducted Agent experimentation in our existing QA sandboxes. However, as our use cases grew in complexity and involved multiple agents, a dedicated full copy sandbox became essential. This enabled us to test complex use cases with integrations to external systems in a more robust way. 

Imagine trying to test if an agent can accurately find a specific transaction for a user without having millions of diverse transaction records to search through. Or validating if it can retrieve the correct knowledge article when faced with a nuanced query, without the full breadth and depth of our actual knowledge articles. A full copy sandbox allowed us to simulate these real-world scenarios with unparalleled accuracy, ensuring our agent wasn’t just theoretically correct, but practically robust and reliable. It helped us catch edge cases and performance bottlenecks that simply wouldn’t surface in a smaller, less representative environment.

How did User Acceptance Testing (UAT) in the full copy Sandbox help to ensure the agent performed reliably with representative data volumes before going live?

Woopalanchi: UAT in the full copy sandbox was the final, crucial validation step. With a complete replica of our production data, we could conduct UAT with internal teams and even a select group of pilot users, mimicking actual customer interactions. This wasn’t just about checking if the agent gave the right answer; it was about evaluating its performance under realistic load, and its ability to navigate through complex user journeys with the actual volume and variety of data it would encounter in production.

For example, we could test scenarios where a user might have a very long history of transactions, or where a knowledge article about Product documentation might have many versions. This comprehensive UAT allowed us to fine-tune the agent’s logic, optimize data retrieval, and ensure scalability. It provided the ultimate assurance that when agents went live, they would not only be accurate but also perform reliably and efficiently for all our users, regardless of their specific data profile or query complexity.

(Back to top)

Part 2: Agentforce Testing Center for agent intelligence and reliability

One of my favorite features at Salesforce is the Agentforce Testing Center, a no-code testing tool available right in your sandbox! How did the Testing Center specifically enable you to thoroughly prepare and test your AI agents?

Woopalanchi: The Agentforce Testing Center was indispensable for thoroughly validating our AI agents. For transaction lookups, the complexity multiplies due to the sheer volume and variety of data. The Testing Center, integrated with our full copy sandbox, let us run tests against representative data volumes. We could verify that the agent not only understood the request but also accurately queried Data Cloud, retrieved the correct transaction details, and presented them clearly to the user. This systematic, repeatable testing was key to ensuring both the precision and the security of these sensitive operations.

What kinds of test cases could you run in the Agentforce Testing Center that wouldn’t be possible or as efficient in a live environment?

Woopalanchi: There are several types of test cases that are either impossible or highly inefficient/risky to run in a live production environment, which the Agentforce Testing Center made not only possible but efficient.

  • High-volume stress testing: We could simulate up to 1,000 concurrent user requests to understand how the agent performs under extreme load. This is crucial for help.salesforce.com, given the global user base. You simply can’t do that in production without risking service disruption.
  • For negative testing & edge cases: We could intentionally feed the agent incorrect, ambiguous, or malicious inputs to see how it handles them. This helps identify vulnerabilities or unexpected fallback behaviors. For example, what if a user asks for a transaction lookup but provides a completely invalid ID? The Testing Center allowed us to test these “failure” scenarios safely.
  • Regression testing: Every time we made an update or added a new feature, we could run our entire suite of existing tests to ensure that the changes hadn’t inadvertently broken any previously working functionality. This automated regression testing saved immense time and prevented new bugs from reaching production.
  • Testing with synthetic data: For scenarios where specific real-world data might be sensitive or unavailable for broad testing, the Testing Center’s ability to generate synthetic data allowed us to create custom scenarios to thoroughly test new agent logic or integrations without privacy concerns.

How did the Agentforce Testing Center contribute to the overall reliability and trust in our AI agents at Salesforce?

Woopalanchi: The Agentforce Testing Center was foundational to building reliability and trust. In the world of AI, “hallucinations” or incorrect responses are a major concern. The structured testing framework, combined with detailed logging and monitoring within the Testing Center, allowed us to systematically identify and eliminate these issues. With Agent builder and Testing Center, We could see exactly why an agent responded the way it did, trace its decision-making path, response accuracy and pinpoint where adjustments were needed.

This iterative process of testing, analyzing, and refining in a controlled environment meant that by the time the agents went live, we had high confidence in their accuracy and consistency. It transformed agent development from a speculative process into a data-driven, quality-assured one. For our customers, this translates directly into a more reliable and trustworthy self-service experience, which is paramount for a support portal.

Developing AI agents comes with unique challenges. How did you ensure correct responses and actions for queries?

Woopalanchi: This is where the Agentforce Testing Center truly shines in addressing the “black box” challenge often associated with AI. It provides deep observability into the agent’s internal workings. For every test run, we get detailed logs and audit trails that show:

  • Action execution: If the agent was supposed to perform an action (like initiating a password reset flow). Testing Center makes sure correct or expected topics and actions were called and executed.
  • Fallback behavior: How the agent responded when it couldn’t find a direct answer or encountered an unexpected input. Did it gracefully escalate to a human agent, or provide a helpful alternative?

This level of visibility allowed our team to debug effectively, understand where the agent’s reasoning might deviate, and continuously fine-tune its logic and knowledge base connections. It’s like having an X-ray vision into the AI’s “brain,” ensuring transparency and enabling us to guarantee that the agent consistently arrives at the correct answers and takes appropriate actions.

(Back to top)

Part 3: Deployment and what’s ahead

What is an enhancement you would like to see in Salesforce Sandboxes to make agent development and deployment easier?

Woopalanchi: I’m excited about the Command Center, an advanced observability platform from Salesforce. This tool is set to be a game-changer for us, offering a deep, data-driven look into how our AI agents are performing – right out of the box.

Powered by a unified Agent data model, it provides detailed session tracing for individual interactions and collects information at runtime. This will allow us to analyze everything from conversation quality to technical metrics, giving IT and business leaders an overview of where agents are deployed and identifying opportunities for improvement.

It’s exactly what we need to elevate our development and deployment process and understand how our agents are driving value.

Build on the Salesforce Platform

Since its deployment, Agentforce now handles everything from responding to routine inquiries and performing common tasks to opening service tickets that seamlessly escalate to human reps with the full case history and context.

To date, Agentforce has handled more than 1 million conversations, resolving an over 75% of customer queries without human intervention.

Attending Dreamforce? Learn more in our session, How Salesforce Builds Agents Securely in Sandboxes, which will also be available on-demand on Salesforce+.

Take a closer look at Dreamforce

Want to see how Salesforce built an agentic enterprise — and how your company can become one too? Check out Salesforce+ during Dreamforce (Oct. 14-16) and get a virtual front row seat.

(Back to top)

Get the latest articles in your inbox.