Agentforce and RAG: Best Practices for Better Agents
A best practices guide for powering Agentforce with unstructured data and long free-text fields
A best practices guide for powering Agentforce with unstructured data and long free-text fields
Reinier van Leuken, Senior Director of Product Management - Agentforce
This guide provides best practices for using unstructured data and long free-text fields to power Agentforce with retrieval augmented generation (RAG) on the Salesforce Platform. RAG enhances agent responses by making them more accurate, up-to-date, and relevant to a company’s enterprise knowledge, which includes files, emails, articles, documents, call notes, object descriptions, and structured table fields. Some RAG tools are simple and ready to use, while others offer detailed configuration options. This guide helps Salesforce admins and developers make informed design choices to optimize their solutions.
The content is based on real-life scenarios and implementations, covering technology that is generally available or in open beta. Roadmap items are occasionally discussed and marked as #Roadmap, with release timelines provided when available, subject to forward-looking statements in the appendix.
This document complements help.salesforce.com and Trailhead documentation, offering practical guidance to achieve high-performing agents.
This introduction provides a brief overview of RAG on the Salesforce Platform. However, it is not a complete beginner’s guide. The chapter concludes with prerequisites for understanding the rest of the document.
RAG in Data Cloud is a framework for grounding prompts for large language models (LLM). By adding accurate, current, and pertinent information, RAG improves the relevance and value of LLM responses for users. There are different ways to bring data to a prompt, such as merge fields or a data graph. When we speak of RAG in this document, we refer to augmenting a prompt with long-form textual content retrieved based on semantic similarity to a query.
When you submit an LLM prompt, RAG in Data Cloud:
Many LLMs were trained on static and publicly available content from the internet. RAG adds information to a prompt that is accurate, up to date, and not part of the LLM’s trained data, such as a company’s private data. It supplements the LLM’s capabilities with relevant information from a knowledge store. With RAG, users provide proprietary data to the LLM without needing to retrain or fine-tune the model. The result is LLM responses that are more pertinent to the user’s context and use case.
Example use cases for RAG are:
It’s helpful to think of RAG in two main parts: offline preparation and online usage.
Each time a prompt template with a retriever runs, this sequence occurs as shown in the diagram above:
The query is a critical part of the prompt. It contains the search string that reflects the user’s intent. The RAG process uses this search string to retrieve relevant data based on semantic similarity, focusing on meaning rather than exact words. Hybrid search adds keyword similarity to semantic search, which can enhance LLM response quality. RAG differs from simply looking up data using an identifier or a single keyword. It always involves identifying longer, free-form textual data based on semantic similarity with a search string that includes more than one keyword.
The fastest way to set up your RAG solution is to add an Agentforce Data Library (ADL) in Agentforce Builder or Setup. Creating an ADL automatically sets up all the components needed for a working, RAG-powered solution. These components include data streams, objects and mapping, vector data store, search index, retriever, prompt template, and the agent action. Salesforce uses default settings for these components. Alternatively, you can use the created components as a basis for further configuration and refinement. For example, you can use the default retrievers in custom templates or create custom retrievers for the search index.
To implement RAG in Data Cloud manually, start by connecting the structured and unstructured data from which RAG retrieves relevant information for LLM prompt grounding. Data Cloud manages structured and unstructured content in search-optimized ways using search indexes. Content in supported file types can be ingested from various sources. Unstructured content used with RAG includes service replies, cases, RFP responses, knowledge articles, FAQs, emails, and meeting notes.
Offline preparation in Data Cloud involves these steps:
To learn more, see Search for AI, Automation, and Analytics.
Retrievers act as the bridge between search indexes and prompt templates. When you create a search index, Data Cloud automatically creates a default retriever for it, which is visible in Einstein Studio. To support various use cases, you can create custom retrievers in Einstein Studio . Custom retrievers refine the search criteria and retrieve the most context-relevant information to augment prompts, for example, by adding filters or including extra return fields. To learn more, see Manage Retrievers.
The final piece of the RAG implementation is to add a call to a retriever in a prompt template. For a given prompt template, the prompt designer can customize the retriever query and results settings to populate the prompt with the most relevant information. To learn more, Ground with Retrieval Augmented Generation (RAG) in Data Cloud .
So far, we've described the offline preparation of the search index and its online usage within a prompt template. To bring RAG to Agentforce you need an agent action to call this prompt template. The figure shows the entire run-time flow for RAG-powered agents.
Agent Flow with RAG
The flow above shows how Agentforce selects the right model and retriever to then take the right actions based on unstructured data combined with structured / semi-structured data.
Review the following content to better understand the remainder of this document:
Structured content includes objects and data tables that follow a meaningful structure with values and relationships. These can be categorical (picklists, identifiers), numerical, and referential fields. Some structured content has long-form text fields, such as descriptions, conversations, or articles. Only these fields, where the text contains at least one full sentence representing a semantically meaningful factoid, can be used for RAG. Categorical and numerical fields related to the indexable text, such as sensitivity level, view count, and product type, can be used to enrich the index, as described in Section 5.1.
Examples of structured content are Salesforce objects for knowledge articles (with article body, description, etc.), cases (with case detail, resolution, wrap-up, etc.), or activities (with activity notes).
Files of long-form text, such as documents, articles, emails, and notes, are called unstructured content. This can also include audio and video files, whose textual content is transcribed for the RAG process. The audio and video files stay in their original locations (zero-copy file stores). They’re submitted to a transcription service that converts the speech into text. The transcription is then chunked and vectorized, and only the chunks and vectors are stored in Data Cloud.
Not all files are unstructured by definition. If a file contains an inherent structure with fields and values, like JSON, CSV, or XML formats, it should be loaded as structured data first. Only any long-form texts can be used for chunking and vectorization. The other fields can be used for index and retrieval enrichment, as described in Section 5.1.
To improve LLM prompts and responses, shape the source content to optimize information retrieval. Certain product features address issues in poorly formed content, such as enriched indexing (Section 5.4) and field prepending (Section 5.1). However, applying content curation practices to the source content can enhance retrieval and improve results.
Additional content sources are supported for search indexes created manually in Data Cloud:
*Note: The files are not copied to Data Cloud. Instead, we use a zero-copy approach that only processes and stores the necessary metadata and indexed content.
The following recommendations apply to documents and long-form text stored in long text fields, such as knowledge articles or other objects that contain long-form text. For AI-generated articles, simply provide these recommendations as instructions in the prompt.
The search index supports content in many languages, thanks to the multilingual embedding models as described in detail in Section 5.5. It supports dozens of different languages and preserves semantic similarity across languages. That means a query can be in one language, retrieving semantically relevant results from content written in another language. Results may vary depending on the presence of that language in the training data set of the embedding model. More information on this is in Section 5.5.
Languages can also be used as prefilters (see section 5.1). This is helpful when the content shouldn’t be used across languages, and responses should be generated using content from the same language as the query.
Note that language support in Agentforce is a broader topic. Here's a breakdown of language support by feature:
Intelligent Document Processing (IDP) and data preprocessing are foundational for effective RAG in Agentforce. IDP automates the extraction of structured data from unstructured documents such as PDFs and images, making this data accessible in Data Lake Objects (DLOs). This process is crucial because RAG systems thrive on well-prepared, relevant data. Data preprocessing, more broadly, encompasses all the work needed to refine and clean content before it’s used by Agentforce agents, ensuring optimal performance.
Key use cases for IDP include automated invoice processing, customer service document handling, contract analysis, and automated onboarding. Feed extracted values into Data Cloud, and Agentforce uses them for actionable insights.
For RAG scenarios, IDP is particularly valuable in bulk document processing. It allows for the streaming and extraction of data from large volumes of unstructured files. These insights are then stored in Agentforce . This process enables downstream applications such as RAG, segmentation, and analytics.
Effective preprocessing also involves excluding certain headings from indexing or cleaning up content before it’s handed to Agentforce , ensuring only high-quality information is used.
The fastest way to set up a RAG-powered agent is to configure an Agentforce Data Library (ADL). Users can upload files to the ADL or select knowledge articles either through the Setup or Agentforce Builder interface. However, it isn't possible to add other types of content to the ADL. For instance, if you need to retrieve content from Salesforce objects other than knowledge articles, you will have to configure the RAG pipeline manually. Note that open web search for ADL is available as of May 2025, but this feature won't ingest data or build search indexes, so it is not covered in this document.
Files are manually uploaded to ADL and stored on a file storage managed by Salesforce. The file contents are used for RAG. Files can’t be downloaded from that storage, nor is there a clickable link to bring the user back to that file. It resides there until deleted by a user.
After configuring the ADL, all assets for the RAG pipeline are automatically created using the default settings described below.
SELECT v.Hybrid_score__c AS Score, c.Chunk__c AS Chunk, c.SourceRecordId__c AS SourceRecordId, c.DataSource__c AS DataSource, c.DataSourceObject__c AS DataSourceObject FROM hybrid_search(TABLE(KA_Agentforce_Default_Library_index__dlm), '{!$_SEARCH_STRING}', 'Language__c=''{!$_LANGUAGE}'' AND KnowledgePublicationStatus__c=''Online'' AND DataSource__c in (''FAQ_Internal_Comments_c__c'',''AssignmentNote__c'')', 30) v INNER JOIN KA_Agentforce_Default_Library_chunk__dlm c on c.RecordId__c=v.RecordId__c INNER JOIN ssot__KnowledgeArticleVersion__dlm kav on c.SourceRecordId__c=kav.ssot__Id__c ORDER BY Score DESC LIMIT 10
Retrievers have two paths to Data Cloud objects: files and knowledge articles. For the remainder of the RAG solution stack, these paths converge in the prompt template and agent action.
ADL is recommended when an agent needs to ground its responses to questions in content from Files on a Salesforce managed File store and Salesforce knowledge articles. For other data sources, manual setup is required.
RAG assets should be manually configured in the following scenarios:
A search index can map to only one data source. For multiple data sources, the different data sources must be mapped to a single DMO/UDMO, since a search index can be built for a single DMO/UDMO only. Note that “files” counts as one data source, regardless of the number of files and variety of file extensions.
Here’s an example of a collection of data sources for RAG grounding:
In this example, four separate search indexes are required, each with at least one retriever.
What’s the best way to organize and distribute RAG-related components in a given solution? Should you build one prompt template (and its corresponding agent action) that contains all four retrievers? Should you build a separate prompt template for each retriever? Or perhaps build two prompt templates, each with two retrievers? Or even two prompt templates, one with three retrievers and one with a single retriever? Let’s explore the rationale and tradeoffs of making such design choices.
Approach 1: Build one prompt template and corresponding agent action with all retrievers, using ensemble retrievers.
This approach uses these components:
Previously, when an agent action for the Agentforce reasoning engine was called, the prompt resolution would always invoke all retrievers and augment the prompt with all their results. This could lead to prompt bloat, exceeding the LLM’s context window and causing failures.
Now, with the availability of ensemble retrievers, this approach has been significantly optimized for handling multiple data sources. An ensemble retriever combines and reranks results from various data sources into a single, prioritized set, ensuring the most relevant information is surfaced at the top. This means that instead of invoking multiple individual retrievers, you can now use a single ensemble retriever to ground your prompt.
This approach is best used when:
Benefits of using ensemble retrievers in this approach include:
For existing solutions that used the previous “Approach 1,” you can now simply replace the invocation of multiple retrievers in the prompt template with an invocation of an ensemble retriever. The rest of your agent solution design can remain intact without further changes. At present, ensemble retrievers are available in ADL and other out-of-the-box features, with future tools planned to allow manual bundling of arbitrary retrievers.
This approach uses these components:
This approach requires the Agentforce reasoning engine to call multiple corresponding agent actions.
This approach is best used when:
For example, suppose a data source named “Defects” contains knowledge about product defects. The corresponding instructions or classification description for this resource could be something like: “Always use the action ‘Answer with known defects’ when the customer asks a question about defective goods.” This guides the Agentforce reasoning engine to know when to call this action at run time. Be sure to provide similarly explicit, precise descriptions and instructions for all other actions.
Watch out: Don’t use individual actions with individual prompt templates as an alternative to the first approach above. Don’t use instructions that direct the Agentforce reasoning engine to execute a chain of multiple actions, such as: “Always use four actions to answer the user question. First, call action one for files. Second, call action two for knowledge articles,” and so on. Here’s why you should avoid this approach:
If applicable to your use case, it’s feasible to use both approaches in the same solution. For example, a solution can have an individual action specifically for “Defects” (approach 2), and combine the other three grounding sources (“files,” “knowledge articles,” and “cases”) in one prompt template (approach 1).
Topics and actions in RAG-powered agents need descriptions, instructions, and scope delineations. All general best practices apply, as described on this help page for topics and this help page for actions. Write the instructions and scope descriptions for RAG topics and actions such that they are only selected and called-for questions within the scope of what the search index can answer.
Once the RAG action is called, the retriever from the prompt template will always fetch results from the search index and the LLM will generate a response. When no relevant content could be found in the search index, proper instructions in the prompt template will encourage the LLM to avoid hallucination (see Section 7). However, it is better to avoid that the RAG action is called for out-of-scope questions in the first place. This reduces the risk of hallucinations even further and reduces the cost and latency of the agent.
In the RAG solution, a search index can be set up (using the search index builder) to support hybrid search. Hybrid search combines the strengths of vector search and keyword search into one search call. Think of it as two different retrieval operations from a single data source, the results of which are merged and reranked. It’s analogous to ensemble retrieval (Section 3), except that hybrid search derives results from a single data source while ensemble retrievers derive results from multiple data sources.
Hybrid search combines and ranks the results of vector and keyword searches so the highest-ranked chunks are those that are both semantically and lexically similar.
By itself, vector search is strong in semantic similarity, but it can fail to recognize keywords when they matter. For example, vector search understands that “How to log in to my account?” and “How can I sign on?” are similar queries. But vector search can fail to understand that looking for “LaserPrinter TX 400” and “LaserPrinter TX 440” are also similar. Unlike keyword search, vector search doesn’t match numbers well, nor does it match specific domain terms (such as a laser printer) well.
However, in combination, vector and keyword searches now reinforce each other and return the best results for questions like “What should I do if my laser printer TX 400 has a paper jam?”
Use hybrid search to retrieve context using both semantic and keyword similarity. Hybrid search is recommended when, for example, keywords such as product names, brands, specific terminology, or jargon are key to retrieval quality. When the user question and all retrievable content are natural language with no specific terminology, semantics, or keywords, the added value of hybrid search is smaller and it is possible to rely on vector search only.
However, don’t use hybrid search as a keyword search engine for categories.
For RAG solutions, vector search can be used in isolation, but keyword search cannot. Keyword search can reinforce the vector search results, but cannot be used standalone for lexicographic search. For search indexes, it’s inadvisable to select index fields that contain only categories (for example, picklists in Salesforce).
Categories result in extremely short chunks (one word, a few words). Although these micro-chunks are also matched for semantic similarity against the user query, semantic search with single-word chunks doesn’t work well because these chunks lack semantic context. As a result, the vector search part of the hybrid search becomes erratic, with inaccurate final rankings. Instead, categories are better suited as prepend fields as discussed in Section 5.1.
Hybrid search improves retrieval results with the tradeoff of increased run-time latency and Data Cloud credit consumption.
A hybrid search operation processes queries on both vector and keyword indexes and reranks the results, consuming roughly twice as many Data Cloud services credits.
During reranking, hybrid search combines the vector score and keyword score to produce a hybrid score upon which the final ranking is based.
Two additional ranking factors in the search index builder — popularity and recency — can influence the final rankings. In the search index builder, the user can select two fields on the (related) DMO that define these document characteristics. The final ranking takes these designations into account and ranks more popular and more recent content higher.
See this article on help.salesforce.com for more information and an example.
When setting up a search index, Data Cloud performs chunking on the data before vectorizing it. Chunking decomposes the information in smaller pieces. Every chunk (and vector) represents a meaningful factoid or set of factoids. It’s not feasible to represent an entire lengthy document by a single vector because this single vector can’t be the semantic representation of all the content from the document.
Consider these four roles that fields can play in a RAG solution:
The example below illustrates these four roles fields can play. In this use case, the Case object is used to answer user questions. The prompt is augmented with the resolution of closed cases whose description matches the user question. To achieve this;
Use case diagram: Reply to customers using previous case resolutions.
When building a search index, select index fields during step 2, chunking. Click the “Manage Fields” button. Upon search index creation, index fields are chunked, vectorized, and then used during the search process for evaluating semantic similarity to the query.
Only text fields can be selected for index fields. Select only text fields with longer, free-text contents, as opposed to categories/categorical data. Don’t select categorial fields. You can select multiple fields for indexing. For example, if “Description,” “Summary,” “Content,” and “Resolution” are selected, all corresponding vectors are stored together in the same search index. It’s possible to separate vectors on the basis of the field named DataSource__c on the DMO of the vector. DataSource__c contains the original field name. Because this field is in the Index DMO, it’s possible to use it in a retriever’s prefilter. For example, the retriever could evaluate queries on semantic similarity to a specific field only (such as “Description” and not the “Resolution”).
Caution: Don’t select categorical columns as index fields. Categorial data are single-word or two-word descriptors that map to a picklist in Salesforce. To yield good results, semantic search requires a longer textual scope and more context. Recall that hybrid search complements semantic search with keyword search. The search index is not a keyword search engine. Instead of indexing categorical fields alone, they should be prepended to text columns that contain longer, free text contents.
Caution: Avoid selecting too many similar fields. Less is more here. Be careful not to select all text fields and avoid selecting possibly redundant fields (for example, “Summary,” “Title,” and “Description”). Doing so can lead to decreased recall when the search index is used without prefilters on DataSource__c. Because these fields all likely contain the same or very similar information, for a given query, at least three chunks from the same document can appear high in the ranking (one for each field). These bring the same information to the LLM, and when the retriever is set to retrieve, for example, nine results, only three documents will be represented in the result list. This reduces variation and may lead to documents being missed.
It is recommended that, when two or more fields represent the same content but in a different form, select the field with the least-condensed form, such as “Description” in the previous example. Consider prepending that field (see Section 5.3) with a shorter, more condensed version (such as “Title” in the previous example).
In the search engine builder, when selecting the fields to index (DMO case) or the file types to include in the index (UDMO case), users can configure the chunking strategy, as described in this topic in help.salesforce.com.
Use field prepending to add context to chunks and make them easier to identify. For example, suppose you have a chunk that contains a sequence of troubleshooting steps. By prepending that chunk with the text “How to fix Device 123 when it shows behaviour xyz,” you make it easier to identify that content as relevant to a user’s question.
Note: Field prepending is available in DMO-based indexes but not in UDMO-based indexes.
When designing a RAG implementation, carefully consider how field prepending can benefit from the metadata in the environment.
Set up field prepending in the chunking strategy in search index builder. After selecting a field for indexing, open its dialog with chunking settings, and turn on the toggle for “prepend fields.”
Another way to optimize chunking is to tune the chunk size for your solution in the search index setup.
During search index creation, the platform begins by chunking down the content, as far as possible, using the semantic-based passage extraction markers described in help. The platform then lumps the granular chunks back together again until it reaches specified chunk size. The maximum configurable chunk size is currently 512 tokens, which represents about 400 to 500 words in Latin languages.
The optimal chunk size varies per solution. It depends in part on the optimization strategy that best fits the goals for a given solution.
Consider the information density and organizational structure of the content when optimizing the chunk sizes for retrieval. Remember that one chunk results in one vector. The entire chunk content is represented in this one vector. How many words are needed to adequately understand the meaning of a chunk? Are 400 to 500 words needed, or can fewer words sufficiently capture a self-contained, identifiable factoid of information (possibly enhanced with field prepending or chunk enrichment)?
Consider chunking from an augmentation perspective. What does the LLM need to generate a sufficiently usable response? Is a small, individual factoid good enough, or is more context required?
Enriched indexing (coming soon) describes the process where additional (or enriched) chunks are generated during indexing to improve search recall and precision.
When enriched indexing is enabled, three types of chunks are generated for each original chunk: PLAIN, QUESTION, and METADATA chunks.
Chunk Type | Description |
---|---|
PLAIN | Contain the original chunk text; raw content chunks directly from the original document. |
QUESTION | Contain questions that the chunk can answer. Contain a set of LLM-generated questions. The associated plain chunk provides the answers to these questions. All generated questions are concatenated into a single chunk before vectorization. This minimizes the possible semantic mismatch between the user intent from the conversation (phrased as a question) and the context stored in the plain chunks (phrased as answers). Question chunks improve retrieval recall and precision, especially in Q&A-related agent scenarios. Although the vectors belonging to the question chunks are retrieved, the prompt augmentation automatically occurs using the corresponding plain chunks. Therefore, the questions themselves are never augmented to the prompt. |
METADATA | Contain a set of LLM-generated metadata based on the plain chunk. These are the metadata generated during the indexing process: - Keywords (up to 10) - Entities (key entities that occur in the chunk content) - Topics (up to five main topics) - Sentiment (positive/negative/neutral, as specified in the chunk) - Title (concise and informative title) - Summary (brief summary, typically between 100–250 words) |
Enriched indexing greatly improves retrieval accuracy, especially in cases where field prepending isn’t possible (UDMO path) and for Q&A agent actions. Chunk enrichment provides an alternative to intensive content curation because the LLM-generated content helps improve the identification of the right chunks. The tradeoff is that chunk enrichment increases cost and latency because the retrieval operation includes a greater number of chunks.
The platform supports three embedding models.
Use this embedding model if the content is in a language other than English. This model even preserves semantic similarity across languages. For example, a query in French can retrieve relevant articles written in German. This embedding model supports 100 languages. The following table shows all of its supported languages, and the number of tokens per language the model was trained with. We recommend being cautious about languages trained with less than 500 million tokens, because these languages require a thorough evaluation of the quality of the results.
Source: Unsupervised Cross-Lingual Representation Learning at Scale
This ensures order-related actions are only available after an order number has been collected and stored in a variable.
This embedding model is used by default when enriched chunking is turned on. It’s not possible to perform enriched chunking in combination with the E5 models. The model can be used when enriched chunking isn’t enabled. Ada 002 is also multilingual. However, as of this writing, OpenAI hasn’t released a definitive list of supported languages. Additional testing and monitoring is recommended for uncommon languages.
RAG often needs to be performed within the context of a given record. Examples are to search within the tasks of a particular case, or to search within the contracts of a particular account. This is possible by uploading the documents to the Salesforce record as related files. Taking the account example, the following steps are needed to set up such a (no-code) solution:
The retriever is the bridge between the search index and the prompt that it augments with context. When configuring ADL, retrievers are created automatically. To have more control over the retrieval and augmentation process, retrievers can also be created and customized manually in Einstein Studio, including for search indexes that were created using ADL.
The retriever can return additional fields to the chunk it retrieved. These can come from the chunk DMO or the original DMO. If the search index is created against a UDMO (unstructured data such as files), then there typically isn’t much related metadata available. This can be resolved by uploading the unstructured files to a Salesforce record as related files. Using the ContentDocument connectors, these can be brought into the search index as attachments. The search index will then contain chunks that originate from these attachments and from the selected index fields. A retriever can be configured for this search index that returns any field from the source DMO.
For custom retrievers, pre-retrieval filters (or just prefilters) can be configured to enforce specific conditions on all retrieved results, such as being written in a certain language, or belonging to a certain category. Prefiltering guarantees that the requested number of results is returned and that all results adhere to the filter condition. Filters are defined in the setup experience for retrievers and are based on fields defined in the search index builder at the time of search index creation. These fields become part of the schema of the index DMO (containing the vectors).
Note: It’s currently not possible to add prefilter fields to an existing search index.
Prefilters limit the size of the result set and they help focus the results on relevance by excluding extraneous content relative to the query. When the retriever is configured to return 10 results, it returns up to 10 results found in the search index. Results contain content only for which the filter conditions are evaluated to “True.”
In contrast, post-retrieval filtering first retrieves the 10 results and then applies the filters. This likely reduces the size of the result set, even possibly reducing it to 0 (if none of the results the filter conditions evaluate to “True”). Post-filters aren’t currently supported by retrievers. However, they can be formulated in pro-code solutions using Apex (see Section 9). An advantage of using post-retrieval filters is they can use any accessible, related field, whereas prefilters require fields that have been added to the search index for filtering purposes.
In dynamic prefilters, the values of filter conditions are provided at run time. The filter condition is specified for the retriever at design time using a placeholder syntax for the value to be set upon prompt resolution. For example, a filter can be Account = $placeholder. It’s then up to the prompt engineer in Prompt Builder to map $placeholder to the right value from a prompt template input. For example, in a field completion template for an account field, or in a flex template that has account as input, the prompt engineer can map that placeholder to the account name or ID, or whatever field has been added as identification prefilter to the search index. That way, the retriever returns only results that are tagged with that specific account.
(demo link , for now Salesforce internal only)
#Roadmap
Advanced retrieval mode is a retriever feature that combines iterative retrieval with query rewriting. Using advanced retrieval mode optimizes retrieval quality, especially when user queries aren’t well-formed or when a user isn’t sure about what to ask for or what the search index can answer. Specifically, it consists of the following steps:
RAG executes in the usual way: Augment the prompt with the results of step 4 and submit the resolved prompt to the LLM of choice to generate the final response.
Instructions in prompt templates are key to successful LLM generation results.
Example of a Basic Prompt Template
please answer this question:
{!$Input:question}
using this information:
{!$EinsteinSearch:ArticleRetriever_1Cx_Q8Qa1857028.results}
The example above has two merge fields:
Overly simplistic instructions risk LLM hallucinations due to various reasons.
In short, it’s like giving a 12-year-old a geography book and saying, “Please study for the exam.” Some students will succeed, but many need more guidance on how to study and what to do with the book.
The standard, out-of-the-box prompt template called “Answer Question with Knowledge” provides more detailed instructions that follow common prompt design principles. In addition to what’s specified in the basic prompt template described previously, this template provides:
Here are the instructions in the out-of-the-box prompt template, which belongs to standard actions. It uses a dynamic retriever:
###
INSTRUCTIONS
1. Analyze the query: Carefully read and understand the user’s question or issue from the QUESTION section.
2. Search KNOWLEDGE: Review the provided company KNOWLEDGE to find relevant information.
3. Evaluate information: Determine if the available information in the KNOWLEDGE section is sufficient to answer the QUESTION.
4. Formulate response: To generate a reply
<generated_response> to the user, you must follow these rules
a. Find the article-chunk(s) most relevant to answer the user query and VERBATIM extract the ID of the article to set
<source_id> field in the response JSON.
If you are unable to find the relevant article, set
<source_id> to NONE.
b. Use the relevant article-chunk to generate the response that exactly answers the user’s question and set the
<generated response> field.
c. If the user request cannot be answered by the knowledge provided, set the
<source_id> to NONE and
<generated_response> to “Sorry, I can't find an answer based on the available articles.”
5. Refine and deliver: Ensure your response is polite, professional, concise and in {language} only.
6. Review response: Make sure that you have followed all of the above instructions, respond in the desired output format, and strictly stick to the provided KNOWLEDGE only to formulate your answer.
###
KNOWLEDGE:
{!$EinsteinSearch:sfdc_ai__DynamicRetriever.results}
###
QUESTION:
{!$Input:Query}
Users have reported good accuracy with this template. For a given scenario, adding further instructions can improve the response quality.
For example, the following prompt template has a different structure:
Note the instructions that encourage the LLM to think deeply about the offered context and to look at the question from multiple perspectives.
Clearly answer the user’s Query directly and logically, based only on well-reasoned deductions drawn from the Context below.
Then respond to the user’s Query logically, methodologically, thoughtfully, and thoroughly from multiple perspectives, emphasizing different viewpoints based on Context with details and careful reasoning.
Provide details with organized structure in your response. Consider alternative perspectives or approaches that could challenge your current line of reasoning.
If you don’t know how to answer the query, or if there is not sufficient context, please respond with ‘Sorry, I couldn't find sufficient information to answer your question.’
Evaluate the evidence or data supporting your reasoning, and identify any gaps or inconsistencies.
Finally, ask questions to clarify the user’s intent while encouraging critical thinking and self-discovery about the user's Query.
Clearly articulate with details what are facts versus what are opinions or beliefs.
If you don't know the answer, ask questions to clarify the user’s intent.
Pay attention to the entities mentioned in the user’s Query and make sure the context contains information about those entities.
Context:
{!$EinsteinSearch:ArticleRetriever_1Cx_Q8Qa1857028.results}
Query:
{!$Input:question}
Format instructions:
Format your response with Markdown structures as follows:
Start with an overview of the topic.
List the key points in a list and emphasize any critical terms using bold.
For subsequent sections, create headings and subheadings that incorporate the subqueries implicitly.
If there are any steps or sequential data, present them in an ordered list.
End with a conclusion.
Coming soon. [Increase Trust in AI Responses with Citations ]
Retrievers connect prompt templates with search indexes. They allow users to configure a reusable, versionable, no-code query template that specifies what to retrieve from the search index based on a given search string (the user query or question). With retrievers, users can specify:
No-code retrievers support:
For some use cases, queries require more complex expressions to the search index. Examples include:
Retrievers offer a fast and easy (no-code) approach for RAG implementations. Retrievers provide additional capabilities on top of search index querying, such as ensemble retrievers and ( #Roadmap) advanced retrieval mode. As with every no-code artifact on the Salesforce Platform, some use cases are solved best using pro-code options.
At run time, a retriever transforms the user configuration into a Data Cloud SQL query that is used to call the vector_search or hybrid_search function. These functions can also be called from within an Apex class using the Data Cloud Connect API. Apex users have the flexibility and ability to write the query expression directly.
Refer to this help topic on hybrid search to see examples of query expressions, including examples of pre-filter expressions that the no-code retrievers don’t support. Post-filters (although not represented on that page) are supported by where-clauses in SQL expressions.
Users can ground prompts using Apex classes, providing a pro-code alternative for use cases in which no-code retrievers aren’t currently an option.
Note: When the Apex class or glow returns content that would exceed the content window, that content is automatically summarized. In that case, it won’t return the underlying record/chunk data, but a summarized version.
To understand how to work with Apex classes and prompt templates, refer to Adding Apex Merge Fields to a Flex Prompt Template in Salesforce help. In the example on that page, the method public static List doesn’t contain the call-out to the Data Cloud connect API. However, this page provides the structure of the Apex class.
The following example creates a connection to the connect API and the query expression. The code applies a procedural filter (outside of the query expression) for user access to the retrieved content, which is currently unavailable with the no-code retriever.
public static List
<Response> searchSimilarCases(List
<Request> requests) {
List
<Response> responses = new List
<Response>();
Response response = new Response();
String caseDescription = requests[0].RelatedEntity.Description;
ConnectApi.CdpQueryInput input = new ConnectApi.CdpQueryInput();
input.sql = 'SELECT DISTINCT v.score__cScore__c, c.ssot__Id__cId__c, c.ssot__Subject__c
Subject__c"+
'FROM vector_search(\case_chunk_vector__dlm\;\" + caseDescription + '\', \'\', 200) v ' +
'JOIN Case_Chunks__dlm cc ON v.chunk_id__c = cc.chunkid__c ' +
'JOIN ssot__Case__dlm c ON cc.parentid__c = c.ssot__Id__c ' +
WHERE cc.column__c != \'ssot__Subject__c\' AND c.ssot__DataSourceId__c = \'CRM\' ' +
'LIMIT 10';
ConnectApi.CdpQueryOutput output = ConnectApi.CdpQuery.queryANSISql(input);
List Object> data = output.data;
String scs = '';
for (Object searchRecord : data) {
Map
<String, Object>myMap = (Map
<String, Object>) JSON.deserializeUntyped(JSON.serialize(searchRecord));
// check for access of case record for the current user
if (SimilarCasesSearch.getUserRecordAccess((String) myMap.get('Id__c'))) {
Map
<String, String> sc = new Map
<String, String>();
sc.put('Id', (String) myMap.get('Id__c'));
sc.put('Similar_Case__c', (String) myMap.get('Id__c'));
sc.put('Name', (String) myMap.get('Subject__c'));
sc.put('Score__c', String.valueOf(myMap.get('Score__c')));
scs = scs + JSON.serialize(sc);
}
}
response.Prompt = scs;
responses.add(response);
return responses;
}
Depending on your use case, certain factors favor the use of some LLMs over others.
It’s important to select an LLM with a sufficient context window size to accommodate the size of resolved RAG prompts. Consider that one token is approximately ¾ words. For example, 100 tokens can fit around 75 words.
The stronger the model, the better it can reason over the provided context. For a given use case, carefully evaluate how challenging the reasoning task will be. Is the key information roughly present in the provided context? Commonly, the greatest complexity in RAG solutions occurs more in retrieval and augmentation than in the final LLM generation. In such cases, the generating LLM won’t have a very difficult task to perform anymore, and smaller models (like GPT 3.5) can be used as long as the context window is sufficiently large for the use case.
For more complex use cases in which the generating LLM still needs to reason deeper over the content (for example, combining from multiple results, transforming the input, and drawing conclusions), stronger models, such as GPT 4 (Turbo), are recommended.
Retrievers are most commonly used in prompt templates to ground prompts. Retrievers are also used in flows for RAG. Moreover, in flows their output can be used in other ways, such as automation scenarios in which the output is used to check for existing similar content, or classification tagging (as described in Section 13.2).
RAG solutions can be implemented inside a flow. In this approach, the flow calls a retriever to fetch the grounding results, which it passes onto the prompt template that it calls subsequently. This approach gives users more control over the entire RAG process. They can set up entire pipelines of chained prompt templates, retrievers, and transformations. Instead of using a prompt template to drive the process (including calling a flow), the flow serves as the orchestration layer and the entry point into the process. This flow-driven approach offers more advanced post-filtering of the retrieved results, such as checking for specific user access rights or other pro-code filters.
To call a retriever in a Flow, add an action element to the Flow. Search for the retriever name of the retriever in the list of available actions. Flow variables (such as the search string) are available as inputs to the retriever. At run time, values can be fetched from a Salesforce record, from a screen element in a screen Flow, or from any other Flow variable.
Calling retrievers in a Flow also supports dynamic prefilters. In the formula for a dynamic filter, map the right-hand side of the equation to a Flow variable. At run time, use the dynamic filter to filter the retriever results using the context of the Flow (for example, country, language, category, and so on).
The following standard Flow actions can be used to set up sophisticated RAG pipelines in flow.
Flow Action | Description |
---|---|
Detect Language | Detects the language of a query, which can be passed as a filter value to a retriever node for dynamic filtering (by language). |
Transform Query for {Case/Email/Conversation} | Each of these three nodes invoke an LLM transformation that changes a case, email, or conversation into a query that is optimized for retrieval. It improves the query that the retriever passes onto the search index. For example, the conversation-to-query action avoids querying the search index with non-relevant messages such as “How can I help you?” or “How are you today?” Similarly, the case-to-query and email-to-query extract relevant information from the text to remove greetings and other text that shouldn’t be used for search. |
Retriever output is formatted as a JSON array, which is not a supported type in Flow. Therefore, to use the results subsequently in the Flow, a processing action needs to transform the retriever output into a flow-supported type, such as a flattened String. The processor Flow node can be implemented using an Apex class, as shown in the example below.
global with sharing class RetrieverProcessor {
@InvocableMethod
class public static List
<String> GetWebProduct(List
<Requests> queryResults)
{
List
<String> resultsList = new List
<String>();
for (Requests queryResult : queryResults) {
List<String> segments = new List
<String>();
for (ConnectApi.MlRetrieverQueryResultDocumentRepresentation document: queryResult.queryResult.searchResults) {
for (ConnectApi.MlRetrieverQueryResultDocumentContentRepresentation content: document.result) {
if (content.fieldName.equals('Chunk')) {
segments.add(content.value.toString());
}
}}
if
if (segments.size() == 0) {
resultsList.add('No results');
} else {
resultsList.add(String.join(segments, ','));
}
}
return resultsList;
}
global class Requests {
@InvocableVariable
global ConnectApi.MlRetrieverQueryResultRepresentation queryResult;
}
}
In this example, the GetWebProduct method loops through the elements of the retriever output and appends the contents of a returned field named “chunk” to a list of strings. The flow can then iterate through this list downstream, or pass it on to a prompt template node as input for grounding.
When an agent uses RAG to respond to a question and the response is unsatisfactory, there are various RAG-related factors to consider. To a user, the agent simply answered incorrectly, insufficiently, or perhaps not at all. Troubleshooting a poorly performing, RAG-enhanced agent involves investigating and ruling out these various points of failure one by one.
These points of failure can result from:
This section provides
For RAG solutions with ADL, refer to this trouble shooting guide .
Determine whether the right action within the right topic is being executed by the Agentforce reasoning engine. Use the Agentforce Agentforce Builder or the Testing Center to investigate and diagnose.
If the right topic isn’t selected, or if the right topic is selected but the right action isn’t executed, then the problem most likely occurs in the agent configuration of the instructions and classification descriptions. This is an agent problem, not a RAG issue, and therefore outside the scope of this white paper. Refer to the following instructions instead:
Agentforce Builder shows the reasoning path of the reasoning engine and its intermediate results. When using ADL and the standard action, follow the reasoning path and check whether the right retriever/grounding source is passed to the prompt template. If not, fix the agent configuration to pass the correct retriever.
Note: This approach does not apply to custom agent actions that use custom prompt templates. This is because the retriever call happens entirely within the prompt template and is not passed in by the reasoning engine.
There are multiple ways to determine whether the search index is correctly populated with content:
SELECT 'INDEX' AS Location, COUNT(DISTINCT rc.SourceRecordId__c) AS ArticleCount, now() AS Timestamp
FROM
<chunk DMO of the Search Index> rc
UNION
SELECT 'DMO' AS Location, COUNT(DISTINCT kav.Id__c) AS ArticleCount, now() AS Timestamp
FROM
<DMO that was indexed, e.g. Knowledge Article Version> kav
ORDER BY Location;
In Prompt Builder, determine whether:
After verifying that all the assets of the RAG pipeline are connected properly, determine whether qualitative problems are causing answers to be incorrect, incomplete, hallucinatory, or a combination of these symptoms. Qualitative concerns can be more difficult to troubleshoot due to a myriad of possible root causes. Quality problems can arise in the retrieval, in the embedding, in the augmentation, in the response generation, and even in the original knowledge source. (Does the relevant content actually exist in the search index?) This figure shows where quality problems can occur in the RAG pipeline, and to what they can be related.
RAG evaluation quality metrics can help determine where to improve the RAG pipeline. There are three evaluation metrics calculated and shown in a dashboard. This dashboard allows drilling down to retriever level. The metrics are described below, before we dive into what they tell us when inspected jointly.
Metric | Answers | Definition | What does it help with? |
Context Relevance | How relevant is the retrieved content to the query? | LLM-based evaluation | Isolate retrieval problems |
Faithfulness | How grounded is the response in the retrieved content? | LLM-based evaluation | Isolate LLM generation problems |
Answer Relevance | How relevant is the answer to the query? | LLM-based evaluation | Overall response metric of the answer. Especially useful in combination with context relevance and faithfulness. |
Common Patterns in Quality Metrics
The answer is grounded in the retrieved context, but that context isn’t relevant to the query. As a result, the answer relevance is also likely low. This symptom likely indicates a problem in the retrieval.
Possible remediation:
The answer is not grounded in the context, even though that context is relevant to the query. Answer relevance is also likely low. This symptom likely indicates a problem in the LLM generation. It’s possibly due to prompt engineering shortcoming, such as the LLM failing to give sufficiently strong instructions to follow the provided context.
Possible remediation:
The answer is grounded in the context and that context is actually relevant to the query, but the answer relevance is still low. This symptom likely indicates that there wasn’t enough context retrieved to fully answer the query. The problem is likely in the retrieval, particularly in the recall of the retrieval.
Possible remediation:
When setting up a RAG pipeline, admins and developers often want to resolve prompts without generating an answer by the LLM. Doing so supports analysis and optimization of the indexing/retrieval pipeline. The goal is just to observe the content retrieved by the retrievers. Generating the LLM response isn’t necessary.
In Prompt Builder, add &c__debug=1 to the URL of the prompt template. This displays a toggle that lets the admin change between “resolution only,” “response only,” (which provides more screen space for the response) or the standard “resolution and response.”
Retrievers are used outside of RAG use cases. A response is not always needed. Some requirements are met by retrieving semantically similar content from the search index. Consider, for example, a case being created in a service context. Merely showing similar cases to a Service Agent to support a case investigation can provide tremendous value without executing the entire RAG pipeline.
To set up such an automation, the recommended solution is based on flow that calls a retriever when a flow runs. As it produces the set of results similar to the query, the sources of these results, such as cases or articles, can be presented to the user.
Use a search index to conduct text classification, such as intent detection, topic annotation, or case classification. Classification use cases are often solved using a training data set (inputs and their class labels). Instead of training a text classifier with this training data set, the text can be vectorized. Store these inputs in Data Cloud as records of a DMO so that they can be embedded in a search index. A search operation is then based on the semantic similarity between the query and the embedded inputs. However, instead of returning the chunks of the “training” inputs, the search returns the original class labels. When the number of results is sufficiently large (say 50 or 100), it’s possible to conduct a “majority” vote and see which class labels occur most frequently within that set of results. Ordering the class labels by their frequency in the result set provides classification suggestions. Either select the most occurring class label, or select, say, the top three class labels to the user.
This scenario requires supplemental Apex code because the retriever does not support the SQL query used (based on COUNT). This code example uses a query to count the frequency of each class label occurring in the top 50 results, ordering by that count statement, and selecting the top-most class label as the effect.
ConnectApi.CdpQueryInput input = new ConnectApi.CdpQueryInput();
input.sql = 'SELECT r.Label_c__c Label, COUNT(r.Label_c__c) AS counter FROM vector_search(table(Intent_Training_index__dlm), topic,'' , 50) v JOIN Intent_Training_chunk__dlm c ON v.RecordId__c = c.RecordId__c JOIN Intent_Training__dlm r ON r.Id__c = c.SourceRecordId__c GROUP BY r.Label_c__c ORDER BY counter DESC LIMIT 1;
ConnectApi.CdpQueryOutput output = ConnectApi.CdpQuery.queryANSISql(input);
Reinier van Leuken thanks these proofreaders for their invaluable help in shaping the content of this white paper: Eric Ivory-Chambers, Robin de Bondt, Jan van den Broeck, Alejandro Raigon, Vahe Ayvazyan, Giuseppe Cardace, Praveen Gonugunta, Kathryn Baker Parks, Debbie Symanovich.
Take a closer look at how agent building works in our library.
Launch Agentforce with speed, confidence, and ROI you can measure.
Tell us about your business needs, and we’ll help you find answers.