Your AI Agent Works, But Do Your Users Think It’s Worth It?

By evaluating what users mean when they report an agent isn't performing as expected, we can identify critical failure points. [Sanem | Adobe]

Design research shows that improving your users’ perception of value is the key to adoption. Learning from AI agent pain points can help.

Michelle Tabart

October 7, 2025 4 min read

AI agents are quickly becoming the backbone of enterprise productivity. They promise faster resolution times, better efficiency, and happier customers. But for those designing agents, we face a critical challenge: a technically “working” agent, with excellent system accuracy scores, can still be perceived by the user as not providing value, or even not worth using.

Our latest internal research on end user perspectives highlights a crucial gap. Users often don’t have the technical language to describe a specific issue they face when using an agent. Instead, they share generic complaints like:

“It’s wrong.”
“It doesn’t understand.”
“It’s missing something.”

The true measure of success isn’t the model’s performance on a benchmark, but the user’s perception of its value and the trust they place in it.

Here’s what we’ll cover:

What users mean when they say ‘it doesn’t work’
Three tiers of agent failure
How to triage user issues and increase trust
Designing for trust and adoption

What users mean when they say ‘it doesn’t work’

As agents become more widely available to end users, the definition of a “successful” agent has broadened beyond mere model accuracy. For instance, an agent that is technically accurate but unhelpful in practice will ultimately be abandoned by the end user.

Let’s look at an example of an interaction:

End user question	“What is our official company policy on expense reporting for international travel?”
Agent response	“For a detailed, up-to-date answer on international expense policy, please refer to the official ‘Global Travel & Expense Policy’ located on the internal company portal.”

This output is technically “successful” because the agent isn’t connected to this data source and correctly redirects the user to where the information can be found. However, the user must now take manual steps to locate the answer, leading to a perception that the agent isn’t useful.

By evaluating what users truly mean when they report an agent isn’t performing as expected, we can identify critical failure points and value issues that technical systems and model benchmarks aren’t equipped to detect.

“It doesn’t work”	“It’s wrong”	“It doesn’t understand”	“It’s missing something”	“It’s not worth using”
Helpful/unhelpful error	Math error	Irrelevant output	Missing a record/field	Latency
Doesn’t return anything	Factual error	Not grounded appropriately	Missing needed functionality	Missing actionability
Output is nonsensical	Internal inconsistency	Not aligned with policy/best practices	Not comprehensive	Deflection to self-service
Exposes PII or other sensitive information	Contradicts capabilities	Didn’t understand input intent	Unsupported prompt style
	False action/task completion	Tone/style
		Responses are too noisy/lack precision

Take a Deeper Dive

Colorful illustration of a clock in the center, surrounded by stacks of coins representing time and money saved. This research focuses on how to measure the ROI of agents.

Do Agents Save You Time? Here’s How to Measure the Impact on Productivity

On a purple gradient background are four illustrations of people working at computers situated around an oversized lightbulb superimposed on a large blue cloud signifying hybrid collaboration.

Here Are Ways AI Helps Designers Rethink Skills and Jobs To Be Done

Illustration of a long table with diverse co-workers sitting and standing with their computers. There's a collage of shapes in the background that represent digital screens and connectivity.

Design Jobs Are Getting an AI Makeover

Illustration of two people pushing against and oversized speech bubble that's split jaggedly down the middle, signifying a broken conversation.

Here’s What a Broken Conversation with AI Tells You and How Design is the Fix

Three tiers of agent failure

To help our customers triage issues faster, we developed a User Failure Points Framework by analyzing 2000 multi-turn user and agent conversations. We then mapped specific root-cause technical issues back to generic user complaints.

This framework categorizes user issues into three types, aligning to tiers of severity that directly impact task progression and user trust.

P0: System Failures These are the highest severity issues. A P0 failure means the agent fails to work as expected, blocking task progression and severely damaging user trust.
P1: User Intent Not Met In these cases, the agent delivers an output that’s misaligned with the user’s original intent. While the system may be technically functional, a P1 failure blocks task progression and causes user frustration.
P2: Limited Value The agent is functional, but the output is of low perceived quality or low usefulness. These failures lead to the agent being labeled as “not worth using” because they force the user to correct, edit, or re-prompt too often.

P0: System Failures	P1: User Intent Not Met	P2: Limited Value
These failures block task progression		These failures create low perceptions of value
Helpful or unhelpful error	Missing needed functionality	Latency
Doesn’t return anything	Ignored prior input	Tone/style
Output is nonsensical	Internal Inconsistency	Responses are too noisy or lack precision
Exposes PII or other sensitive information	Irrelevant output	Deflection to self-service
Math error	Not grounded appropriately	Missing actionability
Factual error	Not aligned with policy or best practices
Contradicts capabilities	Didn’t understand input intent
False action or task completion	Implicit context ignored
Missing a record or field	Not comprehensive

How to triage user issues and increase trust

Understanding this taxonomy is the first step. The next is applying it to your agent development lifecycle to build trust and increase adoption.

1. Diagnose and triage failures

When P0 System Failures are absent but users are reporting issues, you can use the Failure Points Taxonomy to speed up issue diagnosis during testing. Additionally, to scale this work, you can use an LLM-as-judge evaluation method to more consistently identify the more subtle P1 (User Intent) and P2 (Limited Value) failures.

2. Conduct sentiment analysis

Use sentiment analysis to identify negative value issues expressed by users that traditional testing isn’t picking up. Phrases like, “That’s not right” or “It’s missing X” are critical pieces of feedback. Monitoring this sentiment, especially in multi-turn conversations, is key to diagnosing P1 and P2 issues in the wild.

3. Power up prompts

Vague prompts lead to P1 and P2 failures. Enable agents to clarify ambiguous prompts, a feature that not only improves output quality but also teaches the user how to write clearer, more effective prompts, ultimately reducing agent abandonment.

4. Clearly define agent scope

Manage user expectations by clearly defining what the agent can and can’t do for them up front. For queries that fall outside its domain, program the agent to recommend alternative tools or hand-offs. This small act of transparency prevents frustration and builds enduring trust.

Designing for trust and adoption

The future of Agentic AI isn’t decided by a technical score, it’s going to be decided by user trust and value. By shifting our focus from pure accuracy to the user’s perception of what’s worth it, we can design, build, and deploy agents that don’t just work, but become indispensable tools that users will adopt and champion.

Glowing line art of a brain outline with book and lightbulb icons floating in the foreground.

10 Books for UX Designers That Boost Your AI Mindset

4 min read

Get 5 Insider Design Tips for Building Your AI Agents

5 min read

Michelle Tabart Principal UX Researcher, Salesforce

Michelle is a strategic research leader with more than ten years of experience in UX research for emerging product development and adoption, and customer-centric innovation strategies.

More by Michelle

Your AI Agent Works, But Do Your Users Think It’s Worth It?

Design research shows that improving your users’ perception of value is the key to adoption. Learning from AI agent pain points can help.