Skip to Content
0%

Agentforce Testing Center – The Top 3 Use Cases for Agent Evaluation

In the fast-paced world of AI agents, ensuring your conversational AI delivers accurate, efficient, and reliable responses is paramount. That’s where the Agentforce Testing Center comes in – a critical tool that allows you to proactively identify and rectify issues, guaranteeing a seamless user experience before your agents ever hit production.

What and Why: The Imperative for Offline Testing

The Testing Center is a sophisticated sandbox environment designed to simulate real-world user interactions. It provides a controlled setting to rigorously evaluate an agent’s actual  topic, action, and response against ground truth and predefined evaluation metrics . This process embodies the principle of “shifting left” catching potential problems early in the development cycle, which significantly saves time, resources, and mitigates reputational risk.

Without a dedicated testing facility, you risk deploying agents that:

  • Provide incorrect or irrelevant information due to knowledge gaps.
  • Deliver inconsistent responses due to inefficient instructions.
  • Struggle with complex, ambiguous, or even inappropriate user queries such as prompt injection.
  • Fail to adapt to changing contexts or conversation history.
  • Agent Hallucinations

The Testing Center empowers you to identify and mitigate these risks at scale, ensuring your agents are robust, reliable, and production-ready.

Besides the no-code experience, Testing Center supports low-code experience within Salesforce CLI and Agentforce DX, which give developers more control for automation, CI/CD, and versioning, so that developers can integrate the repeatable, scalable testing jobs into their agent development and deployment.

How: Ensuring Efficiency Through Core Use Cases

The Testing Center offers a powerful suite of features designed to optimize your agent’s performance and knowledge integration.

Below is a high level diagram to demonstrate how the Testing Center works:

  1. Test Suite Before Run: This is the design time that allows users to prepare inputs for the testing and evaluation jobs. 
  2. Test Run Result: This is the runtime that not only execute agent to generate topic/action/response, but also generate eval results such as agent response evaluation metrics

1. Custom Evaluations: Defining Your Own Agent Success

Every agent has a unique purpose and thus needs specific evaluation metrics. Besides the out-of-the-box evaluation metrics, custom evaluations allow you to define precise criteria to assess your agent’s effectiveness, moving beyond simple pass/fail checks.

  • Scenario: A financial services agent designed to explain investment options.
  • Testing Center Application: You create custom evaluation logic to measure:
    • Compliance: Does the response adhere to legal and internal policy guidelines?
    • Competitor mentioning: Does the response recommend any competitor’s offerings?
    • Latency check: Does this response take more time than the expectation? 
  • Outcome: A test case might involve asking about a specific fee structure. The custom evaluation then verifies the response against a predefined criteria (namely, LLM as a Judge that includes a few good examples about compliant, professional answers) to generate a score with reasoning . If the agent fails with a low score, it signals the need for updating related instructions within the agent configurations to ensure failed test cases can pass next time.

In the below image the Politeness Score is determined by providing custom instructions to the LLM.

The below image talks about the setup of Latency Evaluation – if the duration is less than 2 seconds, the result is True.

2. Context Variables: Simulate the output based on certain conditions

Context variables are essential for agents to understand the unfolding narrative and respond coherently across multiple turns with the user the agent is interacting with. They represent the internal memory of the context during the conversation .

  • Scenario: Use a SDR (sales development representative) agent to draft outreach emails for a particular lead
  • Testing Center Application: You create context variables via Agent Builder, such as email scenario and lead name

Example:  After defining email scenario and lead name, users can enter the utterance “Draft an initial outreach email but do not schedule the email. Your response should be a text-formatted email and not JSON”, and then click “Batch Test” button to test this utterance or other utterances, based on the selected context variables. 

3. Conversation History: Bring previous conversations as input

Agents need to effectively reference and learn from earlier parts of an interaction, and we need to test each turn within the conversation to ensure the conversational flow is accurate .

  • Scenario: A sales agent that helps internal teams to check lead status
  • Testing Center Application: You create test cases that pressure test the agent with the same previous conversation or test the agent response in a cumulative way

Example: The user asks, “What’s the phone number of Ken Bell.” The agent replies “Ken Bell’s phone number is 425-555-4463”. The user asks, “What’s the email address?”. The agent replies “Ken Bell’s email address is kbell@example.com”. In this way, Testing Center enables us to check each turn within the context of conversation, so that the lead name “Ken Bell” was not mentioned in the remaining utterances.

The Cornerstone of Agent Success

For teams initiating their Agentforce deployment, the Testing Center provides a structured, reliable foundation. For experienced teams, it serves as a powerful engine for continuous improvement, efficiency gains, and risk mitigation. By identifying and resolving crucial issues offline, you ensure superior agent performance and drive better business outcomes. The Agentforce Testing Center is not merely a feature; it is an indispensable component of a mature AI deployment lifecycle.

Resources:

https://help.salesforce.com/s/articleView?id=ai.agent_testing_center.htm&type=5

Get the latest articles in your inbox.