{"id":68155,"date":"2025-12-01T20:36:16","date_gmt":"2025-12-01T09:36:16","guid":{"rendered":"https:\/\/www.salesforce.com\/?p=68155"},"modified":"2025-12-01T20:36:18","modified_gmt":"2025-12-01T09:36:18","slug":"ai-agent-evaluation","status":"publish","type":"post","link":"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/","title":{"rendered":"How Do You Know if Your AI Agent Is Doing a Good Job?"},"content":{"rendered":"\n<p>Congratulations! You deployed your first <a href=\"https:\/\/www.salesforce.com\/au\/agentforce\/ai-agents\/\" target=\"_blank\" rel=\" noopener\">AI agent<\/a> and it\u2019s out there, doing its job, streamlining your workflows and helping your employees work smarter. You\u2019re tracking the engagement metrics and escalation rate KPIs. But you still might wake up in the middle of the night, wondering, \u201cHave I got enough data to know if my agent is doing a good job?\u201d&nbsp;<br><br>The more insights you have, the more quickly you can make improvements \u2014 which is why a lot of people are asking the same question. \u201cWe\u2019re still in very early days, measuring these agents,\u201d said Jesse Luke, senior manager, data enablement, web, at Salesforce. \u201cIt\u2019s a process everyone is going through.\u201d&nbsp;<br><br>But there <em>are<\/em> ways to measure the quality and effectiveness of your AI agents\u2019 work, starting with the KPIs you put in place at deployment. There are also Salesforce tools \u2014 including some on the horizon \u2014 to help you to assess your agent\u2019s performance.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-does-an-effective-ai-agent-look-like-nbsp\">What does an effective AI agent look like?&nbsp;<\/h2>\n\n\n\n<p>A good agent doesn\u2019t just answer customers\u2019 or employees\u2019 questions. It solves people\u2019s problems. The best agents do this seamlessly.&nbsp;<\/p>\n\n\n\n<p>\u201cHow do you know you\u2019re working with a good AI agent vs. a mediocre one?\u201d Mike Murchison, CEO of Ada, <a href=\"https:\/\/www.linkedin.com\/posts\/mikemurchison_how-do-you-know-youre-working-with-a-good-activity-7330600652357718017-Z26I\/\" target=\"_blank\" rel=\" noopener\">asked<\/a> on LinkedIn. \u201cGood AI should feel like the best server at your favourite restaurant.\u201d&nbsp;<\/p>\n\n\n\n<iframe loading=\"lazy\" src=\"https:\/\/www.linkedin.com\/embed\/feed\/update\/urn:li:share:7330600651401428993?collapsed=1\" height=\"263\" width=\"504\" frameborder=\"0\" allowfullscreen=\"\" title=\"Embedded post\"><\/iframe>\n\n\n\n<p><\/p>\n\n\n\n<p>Like a great server, he said, a great agent anticipates your needs even before you do. \u201cThey remember your preferences, spot any problems before they happen and fix them without fanfare,\u201d he added.&nbsp;<\/p>\n\n\n\n<p>That\u2019s the ideal. But first, you may simply want to know whether your agent is meeting its basic KPIs. \u201cIf you have a good idea of your KPIs and can identify how the agent affects those, you\u2019re off to the races,\u201d Luke said.<\/p>\n\n\n\n<p>On the Salesforce <a href=\"https:\/\/help.salesforce.com\/s\/\" target=\"_blank\" rel=\" noopener\">Help site<\/a>, for example, the customer service agent\u2019s job is to help people quickly find the information they need and reduce the caseload of human agents. The company posts the agent\u2019s <a href=\"https:\/\/www.salesforce.com\/au\/agentforce\/use-cases\/customer-zero\/\" target=\"_blank\" rel=\" noopener\">performance metrics<\/a> on a weekly basis.&nbsp;<\/p>\n\n\n\n<p>The numbers? One week in September, <a href=\"https:\/\/www.salesforce.com\/au\/agentforce\/\" target=\"_blank\" rel=\" noopener\">Agentforce<\/a>, the Salesforce platform for building and deploying AI agents, handled over 61,000 support requests and resolved more than 39,000 of them. Roughly 17,000 requests were handed off to humans.&nbsp;<\/p>\n\n\n\n<p>Those are the kind of KPIs that show your agent is doing its job.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">You can measure only what you can see<\/h2>\n\n\n\n<p>One of the biggest challenges companies have with AI agents is visibility \u2014 being able to see what their agent is doing and make sure it\u2019s acting as they want. Salesforce\u2019s Agentforce Observability offers a unified dashboard that tracks an agent\u2019s error rates, escalation rates, latency and more. It sits within Agentforce Studio, a new suite of tools to gauge an agent\u2019s performance. The dashboard can <a href=\"https:\/\/www.salesforce.com\/au\/blog\/command-center\/\" target=\"_blank\" rel=\" noopener\">answer questions<\/a> such as \u201cHow is adoption and usage trending?\u201d and \u201cAre my agents following legal and regulatory requirements?\u201d&nbsp;<\/p>\n\n\n\n<p>It can also categorise your agent\u2019s conversations into topics so you can see how customers are using the agent. For example, 40% of agent sessions might be about payment problems; another 20% could be cancellation requests.&nbsp;<\/p>\n\n\n\n<div class=\"wp-block-salesforce-blog-newsletter-signup prevent-bottombar-overlap layout layout-ai\" >\n\n\t\t\t\t<h2 class=\"wp-block-salesforce-blog-newsletter-signup__title\">\n\t\t\tGet articles selected just for you, in your inbox\t\t<\/h2>\n\t\t\t\t\t<a href=\"https:\/\/www.salesforce.com\/au\/form\/other\/blog-newsletter\/?d=7013y000000ZfAjAAK&#038;nc=7013y000000ZfAoAAK\" class=\"wp-block-salesforce-blog-newsletter-signup__cta btn btn-lg btn-primary\" target=\"_blank\">\n\t\t\tSign up now\t\t<\/a>\n\t<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">How Salesforce measures performance&nbsp;<\/h2>\n\n\n\n<p>Salesforce conducts its own AI agent evaluation in several ways. The company\u2019s Digital Success team runs synthetic tests twice a month to see how agents perform in hypothetical situations. To do this, they use an in-house tool similar to the <a href=\"https:\/\/www.salesforce.com\/au\/news\/press-releases\/2024\/11\/20\/agentforce-testing-center-announcement\/\" target=\"_blank\" rel=\" noopener\">Agentforce Testing Centre<\/a>, which lets customers test agents in secure sandboxes before they\u2019re deployed.\u00a0\u00a0<\/p>\n\n\n\n<p>Earlier this year, the team ran a test that resulted in low answer-quality scores, with the Salesforce Help agent scoring 59% against a baseline of 60%. When the team looked more closely, they discovered the agent was hallucinating URLs. The solution? \u201cWe delivered a fix, ran another test and improved our answer quality to 67%,\u201d said Zachary Stauber, senior director, digital success, AI, at Salesforce.<\/p>\n\n\n\n<p>The answer-quality score was useful information. But Salesforce also wanted to know how Agentforce was interacting with users in the real world and at scale. And they wanted to give those conversations a score.<br><br>So, the company\u2019s Data Enablement team started looking at the session level, which is the entire conversation between a user and agent. \u201cBut we found that it wasn\u2019t logical to do it that way,\u201d said Manoj Arora, principal member of the technical staff, software engineering, at Salesforce. \u201cThere might be some questions where the agent did a good job and in the same session, a question where the agent did not do a good job.\u201d&nbsp;<\/p>\n\n\n\n<p>The Data Enablement team next looked at individual questions to see how an agent answered each one. But that didn\u2019t make sense either; when they reviewed a single question and answer, the back-and-forth lacked context. Finally, they used a data science model that classifies and clusters similar topics into groups or moments. These are what the team decided to focus on.&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p>The team then used Agentforce to test these agentic moments, scoring them on a scale of one to five. They did this using an internal tool similar to Agentforce Optimisation, which is in beta now and will be available at the end of October as part of Agentforce Observability.<\/p>\n\n\n\n<div class=\"wp-block-columns roundcorners has-background is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\" style=\"background:linear-gradient(135deg,rgb(237,248,255) 0%,rgb(255,255,255) 100%);box-shadow:var(--wp--preset--shadow--natural)\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:66.66%\">\n<h2 class=\"wp-block-heading\" id=\"h-what-s-your-agentic-ai-strategy\"><strong>What&#8217;s your agentic AI strategy?<\/strong><\/h2>\n\n\n\n<p>Our playbook is your free guide to becoming an agentic enterprise. Learn about use cases, deployment, and AI skills, and download interactive worksheets for your team.<\/p>\n\n\n\n<div class=\"wp-block-buttons is-layout-flex wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/www.salesforce.com\/au\/blog\/playbook\/agentic-ai\" target=\"_blank\" rel=\" noopener\">The future starts now<\/a><\/div>\n<\/div>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:33.33%\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;69db5a0c69119&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"69db5a0c69119\" class=\"wp-block-image size-large wp-lightbox-container\"><img decoding=\"async\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/www.salesforce.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/07\/offer.png?w=458\" alt=\"\" class=\"wp-image-119811\"\/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\">More tools coming soon<\/h2>\n\n\n\n<p>Agentforce Optimisation will let you look more closely at your agent\u2019s conversations. It will retrieve your agent\u2019s chat data and pass it to a prompt within Agentforce, which rates the interaction. A score of one to two means your agent needs improvement. A five is reason for celebration.&nbsp;<br><br>\u201cWhat you\u2019re looking for is to see if the right action is being taken by the agent in those conversations and whether those actions are in line with your business goals,\u201d said Luke.<br><br>To get a better idea of how Agentforce Optimisation will score responses, here are two samples of conversations on <a href=\"http:\/\/www.salesforce.com\/au\/\" target=\"_blank\" rel=\" noopener\">salesforce.com<\/a> that the team has already tested:&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>The interaction<\/strong>: A customer asked Agentforce to \u201cprovide some customer success stories that have helped IT experts improve business.\u201d The agent replied, \u201cI couldn\u2019t find any specific customer stories related to IT experts improving business. Would you like to search for something else or do you need information on a specific Salesforce product?\u201d<br><strong>The score<\/strong>: 1-2, because the agent couldn\u2019t answer the customer\u2019s question.<br><br><\/li>\n\n\n\n<li><strong>The interaction: <\/strong>A customer asked, \u201cCan I already build a semantic model in Tableau Cloud that will be used by Tableau Agent?\u201d The agent replied, \u201cYes you can build a semantic model in Tableau Cloud that will be used by Tableau Agent. Tableau Semantics is a powerful AI-infused semantic layer integrated into Data Cloud that translates your data into your business language.\u201d The agent then described the features of Tableau Semantics in detail and provided a link to the component&#8217;s <a href=\"https:\/\/www.salesforce.com\/au\/analytics\/tableau-semantics\/\" target=\"_blank\" rel=\" noopener\">web page<\/a>.&nbsp;&nbsp;<br><strong>The score: <\/strong>5, because the customer received the information they needed.&nbsp;<\/li>\n<\/ul>\n\n\n\n<p>Agentforce Optimisation will cluster multiple interactions like these into meaningful categories so you can evaluate an agent\u2019s performance at scale. You might, for example, want to look at how your agent is handling a specific topic, such as requests for product information. Or you might want to look at clusters by score. Where is your agent routinely getting scores of one or two? Where is the agent doing well? All that will be possible with Agentforce Optimisation.&nbsp;<\/p>\n\n\n\n<p>Companies will be able to customise the tool to suit their business needs. A large retailer, for example, might want to see how their agent handles returns; another company might want to see how the agent manages tech support.&nbsp;<\/p>\n\n\n\n<p>But Agentforce Optimisation isn\u2019t the only new tool on the horizon. Agentforce Analytics 2.0, a more advanced version of the current Agentforce Observability dashboard, is also in beta. The beefed-up dashboard will offer a higher-level view, showing how many conversations have taken place and which topics are being covered, as well as latency and escalation rates. It, too, will be available at the end of October.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why AI agent evaluation is so important&nbsp;<\/h2>\n\n\n\n<p>Companies need to assess their agent\u2019s performance for a simple reason: to know what\u2019s working and what should be improved. With metrics in hand, you might see that you need to update your content, for example or that your agent needs more detailed instructions. \u201cThe number one thing we usually find is bad data,\u201d said Stauber.<\/p>\n\n\n\n<p>Bad or mislabeled data, data from unknown sources or data that\u2019s scattered over multiple systems can <a href=\"https:\/\/www.salesforce.com\/au\/blog\/data-centric-ai\/\" target=\"_blank\" rel=\" noopener\">all be a problem<\/a>. But once you\u2019ve identified the issue, you can take action. That\u2019s what Salesforce\u2019s Digital Success team does when it finds an error like the URL hallucinations mentioned earlier. \u201cWe can do a fix, come back to the baseline programme, run a test again and see how things have changed,\u201d Stauber said.\u00a0<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Relax, your agent is hard at work&nbsp;<\/h2>\n\n\n\n<p>With all these new tools to evaluate your AI agent\u2019s performance, company leaders should be able to breathe a sigh of relief. So, the next time you startle awake wondering how your agent is doing, go back to sleep. Let your agent work at that hour instead.<\/p>\n\n\n\n<div class=\"layout-six wp-block-salesforce-blog-offer\">\n\t<div class=\"wp-block-offer__wrapper\">\n\n\t\t<div class=\"wp-block-offer__content\">\n\t\t\t<h2 class=\"wp-block-offer__title\">Get a first look at how businesses are using AI agents<\/h2>\n\t\t\t\t\t\t\t<p class=\"wp-block-offer__description\">Explore how agents are already helping companies across sales, service, internal operations and more.<br><\/p>\n\t\t\t\n\t\t\t\n\t\t\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t<a class=\"wp-block-button__link\" target=\"_blank\" href=\"https:\/\/www.salesforce.com\/au\/agentforce\/agentic-enterprise-index\/\">Read the free report<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\n\t\t<div class=\"wp-block-offer__media\">\n\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"720\" height=\"720\" src=\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/1_1-1440x1440-1.webp\" class=\"attachment-full size-full\" alt=\"\" srcset=\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/1_1-1440x1440-1.webp 720w, https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/1_1-1440x1440-1.webp?w=500&amp;h=500 500w, https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/1_1-1440x1440-1.webp?w=150&amp;h=150 150w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" data-attachment-id=\"68275\" data-permalink=\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/1_1-1440x1440-1\/\" data-orig-file=\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/1_1-1440x1440-1.webp\" data-orig-size=\"720,720\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"1_1-1440&amp;#215;1440-1\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/1_1-1440x1440-1.webp?w=500\" data-large-file=\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/1_1-1440x1440-1.webp?w=720\" \/>\t\t<\/div>\n\t<\/div>\n\n\t\n\t<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Evaluation tools for agents are still an emerging technology, but there are plenty of ways to assess your agent\u2019s performance today.<\/p>\n","protected":false},"author":146,"featured_media":68158,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"sf_justforyou_enable_alt":true,"optimizely_content_id":"683740ebc672295d125abc8cAU","post_meta_title":"","ai_synopsis":"","jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"sf_topic":[3059,2834,3285],"sf_content_type":[],"coauthors":[3305],"class_list":["post-68155","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","sf_topic-ai","sf_topic-generative-ai","sf_topic-agentforce"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.2 (Yoast SEO v27.2) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>AI Agent Evaluation: Tools To Assess Your Agent\u2019s Work<\/title>\n<meta name=\"description\" content=\"Evaluation tools for AI agents are still an emerging technology. But there are plenty of ways to check up on your agent today.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How Do You Know if Your AI Agent Is Doing a Good Job?\" \/>\n<meta property=\"og:description\" content=\"Evaluation tools for agents are still an emerging technology, but there are plenty of ways to assess your agent\u2019s performance today.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/\" \/>\n<meta property=\"og:site_name\" content=\"Salesforce\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/salesforce\" \/>\n<meta property=\"article:published_time\" content=\"2025-12-01T09:36:16+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-01T09:36:18+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/TSK-41844How_Can_You_Tell_if_Your_Agent_is_Doing_a_Good_Job.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1500\" \/>\n\t<meta property=\"og:image:height\" content=\"844\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Laura Hilgers\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@salesforce\" \/>\n<meta name=\"twitter:site\" content=\"@salesforce\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Laura Hilgers\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/\"},\"author\":[{\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/person\/image\/fd03df4e56962576961c61e3dfecaca2\"}],\"headline\":\"How Do You Know if Your AI Agent Is Doing a Good Job?\",\"datePublished\":\"2025-12-01T09:36:16+00:00\",\"dateModified\":\"2025-12-01T09:36:18+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/\"},\"wordCount\":1592,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/TSK-41844How_Can_You_Tell_if_Your_Agent_is_Doing_a_Good_Job.png\",\"inLanguage\":\"en-AU\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/\",\"url\":\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/\",\"name\":\"AI Agent Evaluation: Tools To Assess Your Agent\u2019s Work\",\"isPartOf\":{\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/TSK-41844How_Can_You_Tell_if_Your_Agent_is_Doing_a_Good_Job.png\",\"datePublished\":\"2025-12-01T09:36:16+00:00\",\"dateModified\":\"2025-12-01T09:36:18+00:00\",\"description\":\"Evaluation tools for AI agents are still an emerging technology. But there are plenty of ways to check up on your agent today.\",\"inLanguage\":\"en-AU\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-AU\",\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/#primaryimage\",\"url\":\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/TSK-41844How_Can_You_Tell_if_Your_Agent_is_Doing_a_Good_Job.png\",\"contentUrl\":\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/TSK-41844How_Can_You_Tell_if_Your_Agent_is_Doing_a_Good_Job.png\",\"width\":1500,\"height\":844,\"caption\":\"A raft of Salesforce tools can help you measure your AI agent\u2019s performance. [Image credit: Aleona Pollauf\/Salesforce]\"},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/#website\",\"url\":\"https:\/\/www.salesforce.com\/au\/blog\/\",\"name\":\"Salesforce\",\"description\":\"Learn how to get ahead of trends and supercharge professional relationships\",\"publisher\":{\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.salesforce.com\/au\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-AU\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/#organization\",\"name\":\"Salesforce\",\"url\":\"https:\/\/www.salesforce.com\/au\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-AU\",\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/logo\/image\/\",\"url\":\"\",\"contentUrl\":\"\",\"caption\":\"Salesforce\"},\"image\":{\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/salesforce\",\"https:\/\/x.com\/salesforce\",\"https:\/\/instagram.com\/salesforce\",\"http:\/\/www.linkedin.com\/company\/salesforce\",\"http:\/\/www.youtube.com\/Salesforce\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/person\/image\/fd03df4e56962576961c61e3dfecaca2\",\"name\":\"Laura Hilgers\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-AU\",\"@id\":\"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/person\/image\/9150474105648021867e7582260e3c98\",\"url\":\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/05\/laura-hilgers.jpeg?w=128&h=96&crop=1\",\"contentUrl\":\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/05\/laura-hilgers.jpeg?w=128&h=96&crop=1\",\"width\":128,\"height\":96,\"caption\":\"Laura Hilgers\"},\"description\":\"I\u2019m a senior writer for the 360 Blog, where I write about all things AI. I came to Salesforce from LinkedIn, and before that, I was a freelance journalist. My articles have appeared in The New York Times, Sports Illustrated, Vogue, and O, The Oprah Magazine.\",\"url\":\"https:\/\/www.salesforce.com\/au\/blog\/author\/laura-hilgers\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"AI Agent Evaluation: Tools To Assess Your Agent\u2019s Work","description":"Evaluation tools for AI agents are still an emerging technology. But there are plenty of ways to check up on your agent today.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/","og_type":"article","og_title":"How Do You Know if Your AI Agent Is Doing a Good Job?","og_description":"Evaluation tools for agents are still an emerging technology, but there are plenty of ways to assess your agent\u2019s performance today.","og_url":"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/","og_site_name":"Salesforce","article_publisher":"https:\/\/www.facebook.com\/salesforce","article_published_time":"2025-12-01T09:36:16+00:00","article_modified_time":"2025-12-01T09:36:18+00:00","og_image":[{"width":1500,"height":844,"url":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/TSK-41844How_Can_You_Tell_if_Your_Agent_is_Doing_a_Good_Job.png","type":"image\/png"}],"author":"Laura Hilgers","twitter_card":"summary_large_image","twitter_creator":"@salesforce","twitter_site":"@salesforce","twitter_misc":{"Written by":"Laura Hilgers","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/#article","isPartOf":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/"},"author":[{"@id":"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/person\/image\/fd03df4e56962576961c61e3dfecaca2"}],"headline":"How Do You Know if Your AI Agent Is Doing a Good Job?","datePublished":"2025-12-01T09:36:16+00:00","dateModified":"2025-12-01T09:36:18+00:00","mainEntityOfPage":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/"},"wordCount":1592,"commentCount":0,"publisher":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/#organization"},"image":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/#primaryimage"},"thumbnailUrl":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/TSK-41844How_Can_You_Tell_if_Your_Agent_is_Doing_a_Good_Job.png","inLanguage":"en-AU","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/","url":"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/","name":"AI Agent Evaluation: Tools To Assess Your Agent\u2019s Work","isPartOf":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/#primaryimage"},"image":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/#primaryimage"},"thumbnailUrl":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/TSK-41844How_Can_You_Tell_if_Your_Agent_is_Doing_a_Good_Job.png","datePublished":"2025-12-01T09:36:16+00:00","dateModified":"2025-12-01T09:36:18+00:00","description":"Evaluation tools for AI agents are still an emerging technology. But there are plenty of ways to check up on your agent today.","inLanguage":"en-AU","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/"]}]},{"@type":"ImageObject","inLanguage":"en-AU","@id":"https:\/\/www.salesforce.com\/au\/blog\/ai-agent-evaluation\/#primaryimage","url":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/TSK-41844How_Can_You_Tell_if_Your_Agent_is_Doing_a_Good_Job.png","contentUrl":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/TSK-41844How_Can_You_Tell_if_Your_Agent_is_Doing_a_Good_Job.png","width":1500,"height":844,"caption":"A raft of Salesforce tools can help you measure your AI agent\u2019s performance. [Image credit: Aleona Pollauf\/Salesforce]"},{"@type":"WebSite","@id":"https:\/\/www.salesforce.com\/au\/blog\/#website","url":"https:\/\/www.salesforce.com\/au\/blog\/","name":"Salesforce","description":"Learn how to get ahead of trends and supercharge professional relationships","publisher":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.salesforce.com\/au\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-AU"},{"@type":"Organization","@id":"https:\/\/www.salesforce.com\/au\/blog\/#organization","name":"Salesforce","url":"https:\/\/www.salesforce.com\/au\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-AU","@id":"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/logo\/image\/","url":"","contentUrl":"","caption":"Salesforce"},"image":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/salesforce","https:\/\/x.com\/salesforce","https:\/\/instagram.com\/salesforce","http:\/\/www.linkedin.com\/company\/salesforce","http:\/\/www.youtube.com\/Salesforce"]},{"@type":"Person","@id":"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/person\/image\/fd03df4e56962576961c61e3dfecaca2","name":"Laura Hilgers","image":{"@type":"ImageObject","inLanguage":"en-AU","@id":"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/person\/image\/9150474105648021867e7582260e3c98","url":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/05\/laura-hilgers.jpeg?w=128&h=96&crop=1","contentUrl":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/05\/laura-hilgers.jpeg?w=128&h=96&crop=1","width":128,"height":96,"caption":"Laura Hilgers"},"description":"I\u2019m a senior writer for the 360 Blog, where I write about all things AI. I came to Salesforce from LinkedIn, and before that, I was a freelance journalist. My articles have appeared in The New York Times, Sports Illustrated, Vogue, and O, The Oprah Magazine.","url":"https:\/\/www.salesforce.com\/au\/blog\/author\/laura-hilgers\/"}]}},"jetpack_featured_media_url":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/TSK-41844How_Can_You_Tell_if_Your_Agent_is_Doing_a_Good_Job.png","jetpack_sharing_enabled":true,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Salesforce","distributor_original_site_url":"https:\/\/www.salesforce.com\/au\/blog","push-errors":false,"primary_topic":{"term_id":3059,"name":"AI","slug":"ai","term_group":0,"term_taxonomy_id":3059,"taxonomy":"sf_topic","description":"","parent":0,"count":33,"filter":"raw"},"featured_image_url":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/TSK-41844How_Can_You_Tell_if_Your_Agent_is_Doing_a_Good_Job.png?w=1500","_links":{"self":[{"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/posts\/68155","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/users\/146"}],"replies":[{"embeddable":true,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/comments?post=68155"}],"version-history":[{"count":8,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/posts\/68155\/revisions"}],"predecessor-version":[{"id":68289,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/posts\/68155\/revisions\/68289"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/media\/68158"}],"wp:attachment":[{"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/media?parent=68155"}],"wp:term":[{"taxonomy":"sf_topic","embeddable":true,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/sf_topic?post=68155"},{"taxonomy":"sf_content_type","embeddable":true,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/sf_content_type?post=68155"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/coauthors?post=68155"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}