Skip to Content
0%

How to Design for Trust in Agentforce Voice

Image of mobile device and digital voice waves
Audio and visual cues, straightforward language, and the right tone help users understand and trust what’s happening in real time. [Pete | Adobe]

When creating voice AI, every sound and silence carries meaning. Using the right signals lets users know the system is working.

Key Takeaways

This summary was created with AI and reviewed by an editor.

Voice AI is having a moment. But there’s a difference between a voice agent that works and one people actually want to use.

That difference comes down to trust.

When a voice agent pauses, even for a few seconds, users start to question what’s happening. Did it hear me? Is it working? Should I try again? In voice, small breakdowns quickly become trust issues.

At Salesforce, we design AI to feel less like a mystery box and more like a capable teammate. In voice, that means moving beyond simply transcribing words and focusing on how interactions feel in real time – how the system signals that it’s listening, thinking, and responding.

We’ve applied these principles to the foundation of Agentforce Voice so that when you build your own agents, you’re starting with a system designed for trust.

Here’s what we’ll cover:

Earn trust by showing what the system is doing
Move beyond logic toward conversational fluidity
Design for a diversity of users
What this means for voice design

Earn trust by showing what the system is doing

In visual interfaces, the system state is usually apparent. In voice, that status is often hidden. Silence isn’t neutral; it creates uncertainty.

If an agent pauses too long, users start to second-guess the interaction. So every moment of silence needs to be intentional, and every response needs to clearly signal what the system is doing.

To bridge this gap and build confidence in the interaction, focus on these three layers of feedback:

1. Make the system state obvious

Trust is built through predictable feedback loops. Audio cues and straightforward language help users understand what’s happening in real time. When an agent says, “Let me check that for you,” it confirms the system is working. This marker allows the user to wait without anxiety because the user understands the system is actively working on their behalf. Silence or robotic phrases like “Processing request” signal to users they’re talking to a machine rather than a capable collaborator.

2. Use visuals to reinforce voice

When voice technology is paired with a screen, motion design acts as the “visual heartbeat” of the interaction. A subtle pulse or animation shows the system is active, listening, processing, and responding. Think of it like watching a progress bar while a large dashboard loads in Salesforce. Even without knowing the exact status, noticing that bar move tells you something is happening and that you don’t need to click refresh or second-guess the system. In Agentforce Voice, motion plays that same role. A visible pulse signals that the request is actively being processed, reducing uncertainty and orienting users to the current stage of the conversation:

3. Read the room

Voice shouldn’t be one-size-fits-all. An agent’s tone and pacing must match the urgency of the task, whether resolving a high-priority system outage or prepping for a routine meeting. We call this situational personality, and it ensures the system feels appropriate and reliable.

You design this personality in Agentforce Voice by adjusting a setting called stability. This dial allows you to match the agent’s expressivity to the specific context of the interaction:

  • Lower stability, high expressivity: Creates a dynamic, high-energy voice. Best for lighthearted moments, celebrations, or brands that lean into excitement, like an energy drink company.
  • Higher stability, calm authority: Creates a steady, composed voice. Best for high-stakes scenarios or industries like insurance or technical support, where a calm presence builds confidence and reduces anxiety.

The goal isn’t just to sound good, but to choose a tone that conveys trust. When building your agent, ask: Does this voice match the gravity of the problem we’re solving?

Back to the top

Move beyond logic toward conversational fluidity

Designing for conversational flow means recognizing that text conveys information, but speech conveys intent. When we design for the nuances of human inflection, we move beyond simple data exchange and toward something that feels more natural and responsive. This requires a shift away from rigid, menu-driven logic and toward a more fluid rhythm that mirrors familiar human patterns.

That includes designing for interruption. People interrupt each other to clarify, correct, or move a conversation forward. Voice agents need to handle this just as naturally. If a user says, “Actually, I meant the other account,” the agent needs to stop immediately, process the new input, and pivot without restarting the entire script.

It also means supporting automatic language recognition from the first word. A global experience shouldn’t require users to change settings or adapt their behavior. The system should recognize the language being spoken and respond in kind, ensuring the interaction is inclusive and efficient.

Back to the top

Design for a diversity of users

Designing for everyone means treating accessibility as a core requirement, not an afterthought. In voice interactions, inclusion is non-negotiable.

One way to support accessibility is by providing a visual companion to the conversation. Closed captioning or live transcripts help users follow along in noisy environments and confirm that the system understood them correctly. 

People process information at different speeds, so users must be able to adjust the agent’s pacing. This control is critical for those who rely on screen readers and are accustomed to much faster speech rates. Giving users control over pacing allows the experience to adapt to them, rather than the other way around.

For people with neurodivergent traits, such as those with ADHD, dyslexia, or auditory processing differences, a live transcript provides a concrete, visual record that confirms the agent understood them correctly, reducing cognitive load and building confidence. And for any user, transcripts make it easy to scroll back and reference a previous response, whether that’s a case number, a next step, or a key account detail, without asking the agent to repeat itself. It’s a better experience for everyone. 

Back to the top

What this means for voice design

Voice is one of the fastest ways to build or break trust with AI. When interactions feel clear, responsive, and natural, users feel in control and stay engaged. When they don’t, even small moments of uncertainty can derail the experience.

Designing for voice means designing for those moments. It means using motion to clarify the system state, supporting natural conversation, and building for a wide range of users from the start.

That’s how we’re approaching Agentforce Voice. Not as a feature, but as a system designed to be understood, trusted, and used.

That’s the standard we’re designing toward. And we think it’s worth talking about.

Back to the top

Get the latest articles in your inbox.