Agentic AI

Agentforce Testing Center – The Top 3 Use Cases for Agent Evaluation

Ran Fu

Harsha Vardhan Viswanath

November 4, 2025 4 min read

In the fast-paced world of AI agents, ensuring your conversational AI delivers accurate, efficient, and reliable responses is paramount. That’s where the Agentforce Testing Center comes in – a critical tool that allows you to proactively identify and rectify issues, guaranteeing a seamless user experience before your agents ever hit production.

What and Why: The Imperative for Offline Testing

The Testing Center is a sophisticated sandbox environment designed to simulate real-world user interactions. It provides a controlled setting to rigorously evaluate an agent’s actual topic, action, and response against ground truth and predefined evaluation metrics . This process embodies the principle of “shifting left” catching potential problems early in the development cycle, which significantly saves time, resources, and mitigates reputational risk.

Without a dedicated testing facility, you risk deploying agents that:

Provide incorrect or irrelevant information due to knowledge gaps.
Deliver inconsistent responses due to inefficient instructions.
Struggle with complex, ambiguous, or even inappropriate user queries such as prompt injection.
Fail to adapt to changing contexts or conversation history.
Agent Hallucinations

The Testing Center empowers you to identify and mitigate these risks at scale, ensuring your agents are robust, reliable, and production-ready.

Besides the no-code experience, Testing Center supports low-code experience within Salesforce CLI and Agentforce DX, which give developers more control for automation, CI/CD, and versioning, so that developers can integrate the repeatable, scalable testing jobs into their agent development and deployment.

How: Ensuring Efficiency Through Core Use Cases

The Testing Center offers a powerful suite of features designed to optimize your agent’s performance and knowledge integration.

Below is a high level diagram to demonstrate how the Testing Center works:

Test Suite Before Run: This is the design time that allows users to prepare inputs for the testing and evaluation jobs.
Test Run Result: This is the runtime that not only execute agent to generate topic/action/response, but also generate eval results such as agent response evaluation metrics

1. Custom Evaluations: Defining Your Own Agent Success

Every agent has a unique purpose and thus needs specific evaluation metrics. Besides the out-of-the-box evaluation metrics, custom evaluations allow you to define precise criteria to assess your agent’s effectiveness, moving beyond simple pass/fail checks.

Scenario: A financial services agent designed to explain investment options.
Testing Center Application: You create custom evaluation logic to measure:
- Compliance: Does the response adhere to legal and internal policy guidelines?
- Competitor mentioning: Does the response recommend any competitor’s offerings?
- Latency check: Does this response take more time than the expectation?
Outcome: A test case might involve asking about a specific fee structure. The custom evaluation then verifies the response against a predefined criteria (namely, LLM as a Judge that includes a few good examples about compliant, professional answers) to generate a score with reasoning . If the agent fails with a low score, it signals the need for updating related instructions within the agent configurations to ensure failed test cases can pass next time.

In the below image the Politeness Score is determined by providing custom instructions to the LLM.

The below image talks about the setup of Latency Evaluation – if the duration is less than 2 seconds, the result is True.

2. Context Variables: Simulate the output based on certain conditions

Context variables are essential for agents to understand the unfolding narrative and respond coherently across multiple turns with the user the agent is interacting with. They represent the internal memory of the context during the conversation .

Scenario: Use a SDR (sales development representative) agent to draft outreach emails for a particular lead
Testing Center Application: You create context variables via Agent Builder, such as email scenario and lead name

Example: After defining email scenario and lead name, users can enter the utterance “Draft an initial outreach email but do not schedule the email. Your response should be a text-formatted email and not JSON”, and then click “Batch Test” button to test this utterance or other utterances, based on the selected context variables.

3. Conversation History: Bring previous conversations as input

Agents need to effectively reference and learn from earlier parts of an interaction, and we need to test each turn within the conversation to ensure the conversational flow is accurate .

Scenario: A sales agent that helps internal teams to check lead status
Testing Center Application: You create test cases that pressure test the agent with the same previous conversation or test the agent response in a cumulative way

Example: The user asks, “What’s the phone number of Ken Bell.” The agent replies “Ken Bell’s phone number is 425-555-4463”. The user asks, “What’s the email address?”. The agent replies “Ken Bell’s email address is kbell@example.com”. In this way, Testing Center enables us to check each turn within the context of conversation, so that the lead name “Ken Bell” was not mentioned in the remaining utterances.

The Cornerstone of Agent Success

For teams initiating their Agentforce deployment, the Testing Center provides a structured, reliable foundation. For experienced teams, it serves as a powerful engine for continuous improvement, efficiency gains, and risk mitigation. By identifying and resolving crucial issues offline, you ensure superior agent performance and drive better business outcomes. The Agentforce Testing Center is not merely a feature; it is an indispensable component of a mature AI deployment lifecycle.

Resources:

https://help.salesforce.com/s/articleView?id=ai.agent_testing_center.htm&type=5

Agentic Commerce Needs Shared Context. Today, Agentforce 360 Delivers It.

4 min read

On a green-blue background is an illustration of three women sitting on chairs under a tree. The tree has a network of nodes superimposed on it.

Reimagining the Role of UX Researchers as Architects of Human-AI Collaboration

6 min read

Ran Fu Product Management Director

Ran is the Director of Product Management for Agentforce Testing Center, where he focuses on helping customers build, evaluate, and scale trusted AI agents. He is passionate about working closely with customers to understand their needs and guiding them through every stage of the journey — from Read More

More by Ran

Harsha Vardhan Viswanath Senior Success Architect

Harsha is a Data Cloud and Agentforce success architect who helps customers design and build efficient and scalable solutions. With his background in data engineering and a keen curiosity for emerging tech, he likes simplifying complex AI concepts and applying them to solve real-world problems.

More by Harsha Vardhan

Agentforce Testing Center – The Top 3 Use Cases for Agent Evaluation

Ran Fu

Harsha Vardhan Viswanath

What and Why: The Imperative for Offline Testing

How: Ensuring Efficiency Through Core Use Cases

1. Custom Evaluations: Defining Your Own Agent Success

2. Context Variables: Simulate the output based on certain conditions

3. Conversation History: Bring previous conversations as input

The Cornerstone of Agent Success

Just For You

Agentic Commerce Needs Shared Context. Today, Agentforce 360 Delivers It.

Reimagining the Role of UX Researchers as Architects of Human-AI Collaboration

Just For You

Driving Alpha: How Portfolio Companies Are Converting the Agentic Thesis into Value Creation

The Future of Business Is Fast: Watch For These 3 Tech Trends

For CEOs, AI Fluency Is the New MBA

The Agentic Advantage: How Private Equity Is Accelerating Exit Readiness

2025 in Review: Design is How AI Finds Meaning

How Agentic AI Will Save Financial Services from the Perfect Storm

How Agentic Personalization Is Redefining the Customer Experience

Top AI News of 2025: The Year Things Got Real

Share article

What and Why: The Imperative for Offline Testing

How: Ensuring Efficiency Through Core Use Cases

1. Custom Evaluations: Defining Your Own Agent Success

2. Context Variables: Simulate the output based on certain conditions

3. Conversation History: Bring previous conversations as input

The Cornerstone of Agent Success

Share article

Explore related content by topic

Get the latest articles in your inbox.

360 Highlights

IT

Commerce

Marketing

Service

Sales

Thanks, you're subscribed!