How we build trusted AI

Trust in AI doesn’t come from a single feature or checkpoint. It’s built through coordinated practices across design, development, testing, and governance. The Office of Ethical and Humane Use brings together Salesforce’s Responsible AI and Technology, Ethical Use Policy, and Product Accessibility teams to guide the development and deployment of AI systems.

Reviews that bring clarity

Before AI systems are deployed, they go through a structured review process that translates complex trust questions into clear product decisions.

Define the use case

Intake

We understand what the AI will do, who it’s for, and what data it touches.

Triage

We identify the highest-impact use cases and set the appropriate level of review and testing.

Review

We confirm guardrails, human oversight points, and acceptable behaviors before building deeper.

Testing

We run evaluations to find failure modes and confirm proposed mitigations actually work.

Implementation

Mitigations get built into the product experience — and tracked after release.

Testing that reflects real-world use

We test beyond expected scenarios to understand how systems behave under real-world conditions. The goal is to identify risks early and improve system performance before release.

Adversarial testing

Manual and semi-automated red teaming identify vulnerabilities such as inaccuracy, jailbreaks, and unsafe outputs.

Content safety testing

Evaluate how systems respond to harmful or sensitive content, including edge cases that require careful handling or escalation.

Accessibility testing

Evaluate experiences with automated and manual accessibility testing frameworks, including assistive technologies and people with disabilities, to validate usability and compliance with accessibility standards.

Employee trust testing

Global employees across the company and various business functions simulate real-world scenarios to evaluate how systems perform across varying use cases.

Large-scale stress testing

Hackathons and bug bounty programs help uncover hidden issues and edge cases.

 Trust guardrails

Explore our platform-level guardrails across key risk areas.

Tabs

System policies

Core safety rules that remain in place even if a user tries to override them. Keeps the agent aligned with approved behaviors.

Subagent controls

Detects when a conversation moves outside the agent’s intended purpose. Prevents probing and redirects back to the allowed scope.

System prompts

The foundational instruction set that defines the agent’s role, behavioral constraints, and policy-aligned boundaries.

Prompt injection detection

Identifies and blocks malicious or hidden instructions in user input before they can influence the agent’s behavior.

Trust is built not just through guardrails, but through design — making AI systems accessible, usable, and understandable from the start.

Trust patterns for safer AI interactions

Trust patterns are reusable design approaches for common AI risks. They help make enterprise AI systems safer, more understandable, and more accountable.

Disclosure of AI-generated content for both internal users and external audiences across all Salesforce AI use cases.

Provide visibility into how a response was generated, including citations, data sources, and relevant context, so users can review and verify outputs.

Define how the system responds when it cannot complete a task, including clear error messages, alternative suggestions, or safe fallback outputs.

Design interactions that encourage review before high-impact actions. Avoid dark patterns and ensure users have a clear opportunity to validate AI-generated content.

Operational safeguards for trusted AI

Design systems that stay aligned with instructions, support responsible human oversight, and reduce misleading or risky behavior across AI experiences.

Evaluate how well agents follow topic instructions, with scoring and explanations to identify when outputs deviate or need refinement.

Introduce human checkpoints for high-impact tasks and provide clear escalation paths. Help users understand when and how to involve a human.

Design voice systems to be clear and not misleading. Ensure performance across languages and accents, and avoid human-like sounds or expressive behaviors that could confuse users.

Design AI to avoid implying emotions, intent, or identity. Systems should not attempt to form emotional bonds or present themselves as human, and should clearly communicate when users are interacting with AI.

Accessibility by design

Trusted AI is accessible AI. At Salesforce, accessibility is built in, not bolted on. Itʼs a core part of how our AI products are designed, developed, and tested. We embed accessibility throughout the software development lifecycle (SDLC) with accountability practices that continuously improve experiences. Explore the various ways in which we embed accessibility by design.

Accessibility throughout the SDLC

3D pink headset icon with a cursor pointer.
Accessible design

Designing with accessibility from the start

Accessibility is built in from the earliest stages of design, incorporating best practices and input from people with disabilities.

3D illustration of three purple cubes arranged together representing modular building blocks.
Design system

Salesforce Lightning Design System (SLDS)

Accessibility is built into SLDS through guidance and reusable components that support consistent, accessible experiences.

3D blue circle icon with a code bracket symbol.
AI & Automation

AI-assisted development and testing

Automated checks identify accessibility issues early and help teams resolve them during development.

3D shield icon with a checkmark symbolizing trust and protection.
Governance and accountability

Continuous monitoring and improvement 

Accessibility is validated through conformance reports, audits, and customer-reported issues, measured against WCAG 2.2 AA, with findings resolved and fed back into development.

Transparency we share and trust signals we measure

Model transparency and evaluation

Model cards

Public summaries for Salesforce-owned models that describe what a model is designed to do, where it has limits, and what risks teams should plan for. They also include evaluation highlights.

Building trusted AI FAQs

Trusted AI is built through coordinated practices across design, development, testing, and governance. This includes structured review before launch, real-world testing, built-in guardrails, and ongoing monitoring once systems are live.

AI systems go through multiple layers of testing, including adversarial testing, evaluation against known datasets, and, where applicable, employee-led trust testing. These processes help identify risks, validate performance, and improve system behavior before deployment.

Agents operate within defined constraints, including system policies, subagent controls, and guardrails that limit what tools they can use. These controls help prevent misuse and keep agents aligned with their intended purpose.

Human oversight is built into workflows through review steps, approval gates, and escalation paths. Users can pause, edit, or override actions at any time. See how this works in practice in A day empowered by agents.

Data protections include zero data retention, permission-based data access, and controls that limit how AI workflows use that data. Agents only retrieve data that a user is authorized to access.

Responses are grounded in trusted enterprise data and supported with citations so users can verify outputs. Evaluation systems, benchmarking, and testing frameworks help measure accuracy and identify issues.

Observability tools, audit trails, and evaluation signals provide visibility into system behavior. These systems help teams identify issues, measure performance, and continuously improve AI over time.

Accessibility is integrated across the product lifecycle, from design through testing and release. This includes accessible design systems, automated testing, and validation with people with disabilities to support real-world usability, aligned with standards such as WCAG 2.2 AA.

Explore the resource library for reports, policies, and practical guidance on building and governing trusted AI.