{"id":68142,"date":"2025-09-26T14:25:54","date_gmt":"2025-09-26T21:25:54","guid":{"rendered":"https:\/\/www.salesforce.com\/?p=68142"},"modified":"2025-12-01T21:11:37","modified_gmt":"2025-12-01T10:11:37","slug":"measuring-unpredictable-ai","status":"publish","type":"post","link":"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/","title":{"rendered":"Measuring Unpredictable AI: What Business Leaders Need to Know"},"content":{"rendered":"\n<p>Recently my daughter asked a seemingly simple question over dinner: &#8220;Dad, which is bigger, Australia or Europe?&#8221;<\/p>\n\n\n\n<p>As any parent today knows, these moments present a choice \u2014 attempt an answer from memory or consult the go-to digital authority. As a family, we decided to put ChatGPT to the test.<\/p>\n\n\n\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a689cfb4b990&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a689cfb4b990\" class=\"wp-block-image size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1924\" height=\"1076\" data-attachment-id=\"68153\" data-permalink=\"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/image-238\/\" data-orig-file=\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/image_947a74.png\" data-orig-size=\"1924,1076\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image\" data-image-description=\"\" data-image-caption=\"\" data-large-file=\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/image_947a74.png?w=1924\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" src=\"https:\/\/www.salesforce.com\/blog\/wp-content\/uploads\/sites\/2\/2025\/09\/image_947a74.png?strip=all&#038;quality=95\" alt=\"\" class=\"wp-image-68153\" style=\"width:495px;height:auto\" srcset=\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/image_947a74.png 1924w, https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/image_947a74.png?w=894&amp;h=500 894w, https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/image_947a74.png?w=768&amp;h=430 768w, https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/image_947a74.png?w=1536&amp;h=859 1536w, https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/image_947a74.png?w=150&amp;h=84 150w\" sizes=\"auto, (max-width: 1924px) 100vw, 1924px\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\taria-label=\"Enlarge\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.imageButtonRight\"\n\t\t\tdata-wp-style--top=\"state.imageButtonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewBox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n\n\n\n<p>The response was illuminating in an unexpected way: &#8220;Australia is larger in land area,&#8221; the AI declared confidently. Then, it provided specific data showing Europe at 10.2 million square kilometres versus Australia&#8217;s 7.7 million \u2014 numbers that directly contradicted its initial claim.&nbsp;&nbsp;<\/p>\n\n\n\n<p>This wasn&#8217;t a minor computational error or a knowledge gap. This was something more fundamental \u2014 a window into what researchers call <a href=\"\/blog\/jagged-intelligence\/\" target=\"_blank\" rel=\" noopener\">&#8220;jagged intelligence,&#8221;<\/a> where AI systems demonstrate remarkable capabilities in complex reasoning while stumbling over seemingly simple tasks. More importantly, it highlighted a critical challenge facing enterprise leaders today: <strong>How do you systematically validate AI systems that don&#8217;t behave like traditional software?<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-deterministic-testing-in-a-probabilistic-world\"><strong>Deterministic testing in a probabilistic world<\/strong><\/h2>\n\n\n\n<p>The fundamental challenge lies in a mismatch between how we&#8217;ve learnt to test technology and how modern AI actually works. For decades, software engineering built its reliability foundation on <strong>deterministic principles <\/strong>\u2014<strong> <\/strong>given identical inputs, you could expect identical outputs. This predictability enabled rigorous testing methodologies: unit tests, integration tests, acceptance criteria, all based on the deep-seated orthodoxy that we can <strong>write code, validate outputs and ship with confidence.<\/strong><\/p>\n\n\n\n<p>AI agents operate by entirely different principles. They&#8217;re <strong>probabilistic systems,<\/strong> designed to generate varied responses based on complex pattern recognition and contextual understanding. Give the same prompt to an AI system 10 times and you might receive ten different responses \u2014 some excellent, others adequate and potentially some that miss the mark entirely. (If interested, you can <a href=\"https:\/\/thinkingmachines.ai\/blog\/defeating-nondeterminism-in-llm-inference\/\" target=\"_blank\" rel=\" noopener\">learn more about this phenomenon here.<\/a>)&nbsp;<\/p>\n\n\n\n<p><strong>But this isn&#8217;t a flaw. It&#8217;s a feature. <\/strong>This probabilistic nature enables AI agents to navigate complex, context-dependant scenarios that no programmer could anticipate or explicitly encode \u2014 adapting their responses to customer sentiment, business urgency and situational nuance in real-time<strong>. <\/strong>The flexibility that makes AI agents valuable for handling diverse, unpredictable customer interactions also makes them fundamentally challenging to validate using traditional testing approaches.<br><br>I was recently discussing this challenge with Walter Harley, our principal AI Research architect and my co-author on this piece. As Walter puts it: &#8220;Traditional software is over seventy years old, so we&#8217;ve accumulated decades of institutional knowledge about how to test it systematically. We understand the failure modes, the edge and corner cases, the patterns of where bugs hide.&#8221;<\/p>\n\n\n\n<p>He continues: &#8220;But LLMs are only about three years old as enterprise tools. We&#8217;re essentially trying to validate systems using testing intuitions that were built for an entirely different paradigm \u2014 and that can be a real problem when businesses are staking their operations on these technologies.&#8221;<br><br>Walter&#8217;s insight hits at the heart of why consumer AI approaches fall short in enterprise contexts. ChatGPT might be perfectly adequate for most consumer use cases\u2014providing film recommendations, drafting poems, helping with research or settling family dinner table debates. But when AI agents are handling customer data, processing financial transactions or representing your brand to millions of customers, the tolerance for &#8220;quirks&#8221; drops to near zero. The stakes shift from mild inconvenience to potential business catastrophe.<br><br>To understand what&#8217;s truly at risk when these systems fail, let&#8217;s consider how another industry approaches high-stakes AI validation.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-validating-when-stakes-are-high-lessons-from-autonomous-vehicles\"><strong>Validating when stakes are high: Lessons from autonomous vehicles<\/strong><\/h2>\n\n\n\n<p>Waymo&#8217;s approach to autonomous vehicle validation offers a compelling parallel for enterprise AI measurement. Self-driving cars, like AI agents, must perform reliably across countless scenarios, but their validation framework recognises that not all failures carry equal weight.<\/p>\n\n\n\n<p><a href=\"https:\/\/waymo.com\/blog\/2024\/12\/new-swiss-re-study-waymo\" target=\"_blank\" rel=\" noopener\">Waymo&#8217;s research<\/a> demonstrates impressive safety performance \u2014 88% fewer property damage claims and 92% fewer bodily injury claims compared to human drivers over 25+ million miles. But what\u2019s more relevant here is that their validation approach acknowledges that different types of failures have dramatically different consequences.<\/p>\n\n\n\n<p><strong>At the mildest level are performance variations. <\/strong>Sometimes the car takes a slightly longer route or brakes more conservatively than necessary. The passenger reaches their destination safely, but the experience isn&#8217;t optimal. In enterprise AI, this might be analogous to a customer service agent providing a correct but needlessly lengthy response or a sales agent missing an opportunity to suggest a relevant add-on service.<\/p>\n\n\n\n<p><strong>More concerning are failures that create undesirable outcomes. <\/strong>The car might stop at the wrong address or take a route that adds significant time. Or perhaps they behave overly cautious, for example accelerating much more slowly from an intersection than a typically aggressive driver. The passenger might experience inconvenience or&nbsp; frustration, but no catastrophic harm occurs. For business AI, this could mean providing outdated pricing information, recommending irrelevant products or failing to escalate a customer concern appropriately. They\u2019re not fatal flaws, but over time these \u201cquirks\u201d would lead to the erosion of trust or customer loyalty.&nbsp;<\/p>\n\n\n\n<p><strong>Most critical are failures that pose genuine danger. <\/strong>When autonomous vehicles stop in the middle of traffic or cause accidents, the consequences become existential. Waymo&#8217;s methodical approach to identifying and preventing such scenarios \u2014 through <a href=\"https:\/\/waymo.com\/safety\/research\/\" target=\"_blank\" rel=\" noopener\">millions of miles of testing and continuous safety research<\/a> \u2014 demonstrates the rigorous validation framework required when lives depend on system reliability.<br><strong><br><\/strong>Enterprise AI operates under similarly high stakes, just in different domains. These are the failures that represent existential business risks: AI agents that divulge confidential information, make commitments beyond their authority or produce harmful content that could damage customer relationships or expose companies to legal liability.<\/p>\n\n\n\n<p>This is why traditional software testing approaches fall short. As Walter explains: &#8220;I like to think of it as the \u2018Success Rate Trap.\u2019 Enterprises can get fixated on aggregate performance metrics like \u2018Our model achieves 97% accuracy on customer service enquiries!\u2019 while completely missing the critical question: <strong>what kinds of wrong answers are we getting in that remaining 3%?&#8221;<\/strong><\/p>\n\n\n\n<p>When that small failure rate includes potentially catastrophic business risks, we&#8217;re not dealing with minor performance gaps\u2014we need systematic approaches to measuring and validating the AI to mitigate different categories of failure.<\/p>\n\n\n\n<p>So how do we move beyond this Success Rate Trap to build validation frameworks worthy of business-critical AI?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-salesforce-s-three-part-approach-to-enterprise-ai-validation\"><strong>Salesforce&#8217;s three-part approach to enterprise AI validation<\/strong><\/h2>\n\n\n\n<p>At Salesforce AI Research, we&#8217;ve developed a systematic framework for measuring AI performance that addresses the unique challenges of probabilistic systems operating in business environments. Each of these approaches operates <strong>with expert AI researchers firmly at the helm<\/strong>\u2014not simply &#8220;in the loop&#8221;\u2014ensuring that scientific rigour and domain expertise guides every validation decision.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-1-ai-powered-judges-for-evaluation-at-scale\"><strong>1. AI-powered judges for evaluation at scale<\/strong><\/h3>\n\n\n\n<p>Enterprise leaders implementing AI at scale need systematic evaluation frameworks that can process thousands of interactions daily. The global AI research community has developed what are commonly known as &#8220;judge models&#8221;\u2014AI systems specifically designed to evaluate other AI systems&#8217; performance, now used by most industry standard AI leaderboards.<br><br>Consider how a Michelin-starred chef evaluates dishes in his restaurant&#8217;s kitchen \u2014 they don&#8217;t just taste and say &#8220;good&#8221; or &#8220;bad,&#8221; but explain precisely why: &#8220;The seasoning is unbalanced,&#8221; &#8220;The composition lacks harmony,&#8221; or &#8220;The presentation feels off-brand for our restaurant.&#8221;<br><br>That&#8217;s exactly what we&#8217;ve built with <a href=\"\/blog\/sfr-judge\/\">SFR-Judge, a family of AI evaluation or \u201cjudge\u201d models<\/a>\u00a0that examines thousands of AI responses while explaining its reasoning: why an output might sound off-brand, contain questionable information or be potentially harmful. Rather than delivering mysterious &#8220;black-box&#8221; judgements, it&#8217;s like having a tireless quality assurance expert providing both the verdict and the &#8220;why&#8221; behind each decision.<\/p>\n\n\n\n<p>Our team is now advancing this work further, developing judges for more complex tasks such as verifying reasoning capabilities for math, code and agentic workflows in high-value enterprise use cases.<\/p>\n\n\n\n<p>This approach comes with an important caveat: we&#8217;re now using probabilistic systems to evaluate other probabilistic systems. The validation is only as good as our judge models \u2014 which is why measuring AI will always require a human to be firmly at the helm.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-2-human-expert-knowledge-extraction\"><strong>2. Human expert knowledge extraction<\/strong><\/h3>\n\n\n\n<p>Even with advanced automated evaluation, certain aspects of AI validation demand human expertise that cannot be easily automated. Enterprise best practices require what we term &#8220;expert knowledge extraction&#8221; frameworks \u2014 systematic approaches to capture domain expertise and incorporate it directly into AI training and validation processes.&nbsp;<\/p>\n\n\n\n<p>Rather than simply having administrators configure AI agents through standard interfaces, we&#8217;re exploring how seasoned experts across various business domains or sectors \u2014 from experienced financial services professionals and healthcare administrators, to sales coaches and customer success managers \u2014 can directly influence agent behaviour through natural conversation and feedback.<br><br>Our collaboration with a healthcare provider institution demonstrates this approach in patient billing support, where expert billing specialists provide nuanced judgement that automated systems cannot replicate. What we learnt is that the human-expert layer serves as both training mechanism and validation checkpoint \u2014 AI agents seamlessly request guidance from specialists during live patient calls, while these interventions become valuable learning that improves future performance. This hybrid approach reduces patient wait times and specialist workload while maintaining the accuracy and empathy standards essential for healthcare billing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-3-simulation-environments-for-comprehensive-testing\"><strong>3. Simulation environments for comprehensive testing<\/strong><\/h3>\n\n\n\n<p>In my recent exploration of <a href=\"\/blog\/synthetic-data-for-training\/\">synthetic data for enterprise AI training environments<\/a>, I discussed how AI agents require sophisticated simulation environments to achieve reliable performance \u2014 much like F1 drivers training in comprehensive simulators before racing at Monaco. But these training grounds serve a dual purpose: they also provide the testing environments needed for systematic validation.<\/p>\n\n\n\n<p>Our AI Research team has developed <a href=\"\/blog\/crmarena-pro\/\">CRMArena-Pro<\/a>, <strong>one of the highest fidelity simulation environments<\/strong> for customer service and sales use cases in the industry. This environment generates millions of realistic business scenarios drawn from our deep understanding of enterprise operations \u2014 while maintaining Salesforce&#8217;s strict privacy standards by using synthetic rather than actual customer data. What sets our approach apart is comprehensive support for voice modalities: simulating lossy phone connections when cell service drops, handling background noise from city buses or subways and modelling different speaker intonations, languages and accents.<\/p>\n\n\n\n<p><strong>Because of the probabilistic nature of AI and because models can&#8217;t be broken down and tested unit by unit in the way traditional software can be, <\/strong>AI validation requires orders of magnitude more scenarios. We need systems that can simulate not just standard business interactions, but also edge cases, adversarial inputs and the countless variations that occur in real-world customer conversations. Our vast libraries of synthetic but realistic interactions stress-test AI agents across countless dimensions, providing the certainty that these systems can handle whatever scenarios might arise in daily operations.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-understanding-the-reality-of-ai-imperfection\"><strong>Understanding the reality of AI imperfection<\/strong><\/h2>\n\n\n\n<p><strong>The reality is simple: AI systems aren&#8217;t perfect and their imperfections manifest differently than human failures.<\/strong><\/p>\n\n\n\n<p>Walter puts it directly: &#8220;Any teams deploying agents need to be monitoring their behaviour. You really need to understand not only when it breaks, but <em>how exactly<\/em> it breaks once trained with your data sources.&#8221;<\/p>\n\n\n\n<p>When a consumer LLM gets confused about continents, we can laugh it off and look up the answer ourselves. When enterprise AI gets confused about customer data or business rules, the stakes are entirely different. Imagine what would happen if an AI agent provided contradictory loan terms in a single proposal or routeing sensitive customer data to unauthorised recipients.<br><br>Organisations that deploy AI based on capability demonstrations alone will struggle with these inconsistent results when impressive technology meets unpredictable business reality. But those who recognise these tools as powerful but imperfect <strong>\u2014 and build appropriate measuring, monitoring and validation frameworks around them \u2014 <\/strong>will gain decisive advantages in the AI economy.<br><br>What drives our successful framework is establishing clear thresholds for human escalation: when confidence scores drop below defined levels, when agents encounter scenarios outside their training scope or when business impact exceeds predetermined risk tolerances. These systematic frameworks ensure agents handle routine tasks independently while seamlessly engaging human expertise for high-stakes decisions. Human-AI collaboration at its finest.<\/p>\n\n\n\n<p>We aren&#8217;t just building better AI agents; we&#8217;re developing the methodologies that will define enterprise AI excellence for years to come.<\/p>\n\n\n\n<p><em>This post is part of our series exploring the components of enterprise AI development. Read our previous post on <a href=\"\/blog\/synthetic-data-for-training\/\">synthetic data and AI agent training environments<\/a> and watch for upcoming deep dives into enterprise data synthesis and advanced training methodologies.<\/em><br><br><em>I would like to thank Walter Harley, Jacob Lehrbaum, Patrick Stokes, Itai Asseo and Karen Semone for their insights and contributions to this article.<\/em><strong><br><br><br><\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently my daughter asked a seemingly simple question over dinner: &#8220;Dad, which is bigger, Australia or Europe?&#8221; As any parent today knows, these moments present a choice \u2014 attempt an answer from memory&hellip;<\/p>\n","protected":false},"author":733,"featured_media":68152,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"sf_justforyou_enable_alt":true,"optimizely_content_id":"68d5b46be9e89939882885c7AU","post_meta_title":"","ai_synopsis":"","_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"sf_topic":[3059,3357],"sf_content_type":[3154],"coauthors":[3344,3345],"class_list":["post-68142","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","sf_topic-ai","sf_topic-ai-research","sf_content_type-blog"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v28.1 (Yoast SEO v28.1) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Measuring Unpredictable AI: What Business Leaders Need to Know | Salesforce<\/title>\n<meta name=\"description\" content=\"Discover how unpredictable AI affects business decisions. Learn what enterprise leaders need to know about validating AI systems, understanding \u2018jagged intelligence,\u2019 and ensuring reliable performance\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Measuring Unpredictable AI: What Business Leaders Need to Know\" \/>\n<meta property=\"og:description\" content=\"Recently my daughter asked a seemingly simple question over dinner: &quot;Dad, which is bigger, Australia or Europe?&quot; As any parent today knows, these moments\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/\" \/>\n<meta property=\"og:site_name\" content=\"Salesforce\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/salesforce\" \/>\n<meta property=\"article:published_time\" content=\"2025-09-26T21:25:54+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-01T10:11:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/Measuring-Unpredictable-AI.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"675\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Silvio Savarese, Walter Harley\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@salesforce\" \/>\n<meta name=\"twitter:site\" content=\"@salesforce\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Silvio Savarese and Walter Harley\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/measuring-unpredictable-ai\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/measuring-unpredictable-ai\\\/\"},\"author\":[{\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/#\\\/schema\\\/person\\\/image\\\/92e4b5ee4e9d26d32c509c44db0b0a24\"},{\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/#\\\/schema\\\/person\\\/image\\\/6c0b6566e452d88ef75d4e6a6351d786\"}],\"headline\":\"Measuring Unpredictable AI: What Business Leaders Need to Know\",\"datePublished\":\"2025-09-26T21:25:54+00:00\",\"dateModified\":\"2025-12-01T10:11:37+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/measuring-unpredictable-ai\\\/\"},\"wordCount\":2075,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/measuring-unpredictable-ai\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2025\\\/09\\\/Measuring-Unpredictable-AI.png\",\"inLanguage\":\"en-AU\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/measuring-unpredictable-ai\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/measuring-unpredictable-ai\\\/\",\"url\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/measuring-unpredictable-ai\\\/\",\"name\":\"Measuring Unpredictable AI: What Business Leaders Need to Know | Salesforce\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/measuring-unpredictable-ai\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/measuring-unpredictable-ai\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2025\\\/09\\\/Measuring-Unpredictable-AI.png\",\"datePublished\":\"2025-09-26T21:25:54+00:00\",\"dateModified\":\"2025-12-01T10:11:37+00:00\",\"description\":\"Discover how unpredictable AI affects business decisions. Learn what enterprise leaders need to know about validating AI systems, understanding \u2018jagged intelligence,\u2019 and ensuring reliable performance\",\"inLanguage\":\"en-AU\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/measuring-unpredictable-ai\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-AU\",\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/measuring-unpredictable-ai\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2025\\\/09\\\/Measuring-Unpredictable-AI.png\",\"contentUrl\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2025\\\/09\\\/Measuring-Unpredictable-AI.png\",\"width\":1200,\"height\":675},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/\",\"name\":\"Salesforce\",\"description\":\"Learn how to get ahead of trends and supercharge professional relationships\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-AU\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/#organization\",\"name\":\"Salesforce\",\"url\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-AU\",\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"\",\"contentUrl\":\"\",\"caption\":\"Salesforce\"},\"image\":{\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/salesforce\",\"https:\\\/\\\/x.com\\\/salesforce\",\"https:\\\/\\\/instagram.com\\\/salesforce\",\"http:\\\/\\\/www.linkedin.com\\\/company\\\/salesforce\",\"http:\\\/\\\/www.youtube.com\\\/Salesforce\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/#\\\/schema\\\/person\\\/image\\\/92e4b5ee4e9d26d32c509c44db0b0a24\",\"name\":\"Silvio Savarese\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-AU\",\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/#\\\/schema\\\/person\\\/image\\\/3026d9c7e239651448991fbbfb2993eb\",\"url\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2025\\\/11\\\/Silvio-Savarese.webp?w=128&h=96&crop=1\",\"contentUrl\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2025\\\/11\\\/Silvio-Savarese.webp?w=128&h=96&crop=1\",\"width\":128,\"height\":96,\"caption\":\"Silvio Savarese\"},\"description\":\"Silvio Savarese is the Executive Vice President and Chief Scientist of Salesforce AI Research, as well as an Adjunct Faculty of Computer Science at Stanford University, where he served as an Associate Professor with tenure until winter 2021. At Salesforce, he shape the scientific direction and long-term AI strategy by aligning research and innovation efforts with Salesforce\u2019s mission and objectives. He also lead the AI Research organization, including AI for C360 and CRM, AI for Trust, AI for developer productivity, and operational efficiency.\",\"url\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/author\\\/silvio-savarese\\\/\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/#\\\/schema\\\/person\\\/image\\\/6c0b6566e452d88ef75d4e6a6351d786\",\"name\":\"Walter Harley\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-AU\",\"@id\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/#\\\/schema\\\/person\\\/image\\\/fbef9ada361f39b8c041382c3f6a3a2b\",\"url\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2025\\\/11\\\/Walter-Harley.webp?w=128&h=96&crop=1\",\"contentUrl\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/wp-content\\\/uploads\\\/sites\\\/4\\\/2025\\\/11\\\/Walter-Harley.webp?w=128&h=96&crop=1\",\"width\":128,\"height\":96,\"caption\":\"Walter Harley\"},\"url\":\"https:\\\/\\\/www.salesforce.com\\\/au\\\/blog\\\/author\\\/walter-harley\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Measuring Unpredictable AI: What Business Leaders Need to Know | Salesforce","description":"Discover how unpredictable AI affects business decisions. Learn what enterprise leaders need to know about validating AI systems, understanding \u2018jagged intelligence,\u2019 and ensuring reliable performance","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/","og_type":"article","og_title":"Measuring Unpredictable AI: What Business Leaders Need to Know","og_description":"Recently my daughter asked a seemingly simple question over dinner: \"Dad, which is bigger, Australia or Europe?\" As any parent today knows, these moments","og_url":"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/","og_site_name":"Salesforce","article_publisher":"https:\/\/www.facebook.com\/salesforce","article_published_time":"2025-09-26T21:25:54+00:00","article_modified_time":"2025-12-01T10:11:37+00:00","og_image":[{"width":1200,"height":675,"url":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/Measuring-Unpredictable-AI.png","type":"image\/png"}],"author":"Silvio Savarese, Walter Harley","twitter_card":"summary_large_image","twitter_creator":"@salesforce","twitter_site":"@salesforce","twitter_misc":{"Written by":"Silvio Savarese and Walter Harley","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/#article","isPartOf":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/"},"author":[{"@id":"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/person\/image\/92e4b5ee4e9d26d32c509c44db0b0a24"},{"@id":"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/person\/image\/6c0b6566e452d88ef75d4e6a6351d786"}],"headline":"Measuring Unpredictable AI: What Business Leaders Need to Know","datePublished":"2025-09-26T21:25:54+00:00","dateModified":"2025-12-01T10:11:37+00:00","mainEntityOfPage":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/"},"wordCount":2075,"commentCount":0,"publisher":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/#organization"},"image":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/#primaryimage"},"thumbnailUrl":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/Measuring-Unpredictable-AI.png","inLanguage":"en-AU","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/","url":"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/","name":"Measuring Unpredictable AI: What Business Leaders Need to Know | Salesforce","isPartOf":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/#primaryimage"},"image":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/#primaryimage"},"thumbnailUrl":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/Measuring-Unpredictable-AI.png","datePublished":"2025-09-26T21:25:54+00:00","dateModified":"2025-12-01T10:11:37+00:00","description":"Discover how unpredictable AI affects business decisions. Learn what enterprise leaders need to know about validating AI systems, understanding \u2018jagged intelligence,\u2019 and ensuring reliable performance","inLanguage":"en-AU","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/"]}]},{"@type":"ImageObject","inLanguage":"en-AU","@id":"https:\/\/www.salesforce.com\/au\/blog\/measuring-unpredictable-ai\/#primaryimage","url":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/Measuring-Unpredictable-AI.png","contentUrl":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/Measuring-Unpredictable-AI.png","width":1200,"height":675},{"@type":"WebSite","@id":"https:\/\/www.salesforce.com\/au\/blog\/#website","url":"https:\/\/www.salesforce.com\/au\/blog\/","name":"Salesforce","description":"Learn how to get ahead of trends and supercharge professional relationships","publisher":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.salesforce.com\/au\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-AU"},{"@type":"Organization","@id":"https:\/\/www.salesforce.com\/au\/blog\/#organization","name":"Salesforce","url":"https:\/\/www.salesforce.com\/au\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-AU","@id":"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/logo\/image\/","url":"","contentUrl":"","caption":"Salesforce"},"image":{"@id":"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/salesforce","https:\/\/x.com\/salesforce","https:\/\/instagram.com\/salesforce","http:\/\/www.linkedin.com\/company\/salesforce","http:\/\/www.youtube.com\/Salesforce"]},{"@type":"Person","@id":"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/person\/image\/92e4b5ee4e9d26d32c509c44db0b0a24","name":"Silvio Savarese","image":{"@type":"ImageObject","inLanguage":"en-AU","@id":"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/person\/image\/3026d9c7e239651448991fbbfb2993eb","url":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/11\/Silvio-Savarese.webp?w=128&h=96&crop=1","contentUrl":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/11\/Silvio-Savarese.webp?w=128&h=96&crop=1","width":128,"height":96,"caption":"Silvio Savarese"},"description":"Silvio Savarese is the Executive Vice President and Chief Scientist of Salesforce AI Research, as well as an Adjunct Faculty of Computer Science at Stanford University, where he served as an Associate Professor with tenure until winter 2021. At Salesforce, he shape the scientific direction and long-term AI strategy by aligning research and innovation efforts with Salesforce\u2019s mission and objectives. He also lead the AI Research organization, including AI for C360 and CRM, AI for Trust, AI for developer productivity, and operational efficiency.","url":"https:\/\/www.salesforce.com\/au\/blog\/author\/silvio-savarese\/"},{"@type":"Person","@id":"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/person\/image\/6c0b6566e452d88ef75d4e6a6351d786","name":"Walter Harley","image":{"@type":"ImageObject","inLanguage":"en-AU","@id":"https:\/\/www.salesforce.com\/au\/blog\/#\/schema\/person\/image\/fbef9ada361f39b8c041382c3f6a3a2b","url":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/11\/Walter-Harley.webp?w=128&h=96&crop=1","contentUrl":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/11\/Walter-Harley.webp?w=128&h=96&crop=1","width":128,"height":96,"caption":"Walter Harley"},"url":"https:\/\/www.salesforce.com\/au\/blog\/author\/walter-harley\/"}]}},"jetpack_featured_media_url":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/Measuring-Unpredictable-AI.png","jetpack_sharing_enabled":true,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Salesforce","distributor_original_site_url":"https:\/\/www.salesforce.com\/au\/blog","push-errors":false,"primary_topic":{"errors":{"invalid_term":["Empty Term."]},"error_data":[]},"featured_image_url":"https:\/\/www.salesforce.com\/au\/blog\/wp-content\/uploads\/sites\/4\/2025\/09\/Measuring-Unpredictable-AI.png?w=1200","_links":{"self":[{"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/posts\/68142","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/users\/733"}],"replies":[{"embeddable":true,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/comments?post=68142"}],"version-history":[{"count":3,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/posts\/68142\/revisions"}],"predecessor-version":[{"id":68291,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/posts\/68142\/revisions\/68291"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/media\/68152"}],"wp:attachment":[{"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/media?parent=68142"}],"wp:term":[{"taxonomy":"sf_topic","embeddable":true,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/sf_topic?post=68142"},{"taxonomy":"sf_content_type","embeddable":true,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/sf_content_type?post=68142"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.salesforce.com\/au\/blog\/wp-json\/wp\/v2\/coauthors?post=68142"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}