 
					Semih Yavuz
author title Research DirectorSemih Yavuz is a Research Director at Salesforce AI Research, leading a team focused on improving the factuality, groundedness, and reasoning capabilities of large language models in knowledge-intensive applications. His work involves developing state-of-the-art embedding and re-ranker models for knowledge retrieval across diverse domains, including code, multi-modal, and multilingual contexts, while refining retrieval-augmented generation (RAG) by enhancing how LLMs consume and integrate knowledge in complex reasoning. His team is focused on pushing the boundaries of the research to develop accurate, scalable, and reliable AI systems and driving product impact with them in the CRM domain.
 
			 
				
			
			AI is rapidly transforming industries, helping businesses enhance customer experiences, improve efficiency, and make smarter decisions. But an essential question arises: How can we ensure that AI is creating accurate and grounded answers?…
 
				
			
			Developers face unique challenges when retrieving code snippets, such as understanding syntax, control flow, and variable dependencies. Enter SFR-Embedding-Code, a groundbreaking family of code embedding models that aims to address these challenges and revolutionize how we retrieve and generate code.
 
				
			
			The SFR-Embedding-Mistral marks a significant advancement in text-embedding models, building upon the solid foundations of E5-mistral-7b-instruct and Mistral-7B-v0.1.
 
				
			
			World’s #1 CRM introduces its first sales LLM Sales reps are constantly on the move, transitioning from one customer site to another, with meetings scheduled back-to-back. The demands of managing a complex pipeline…
 
				
			
			TLDR We trained a series of 7B LLMs named XGen-7B with standard dense attention on up to 8K sequence length for up to 1.5T tokens. We also fine tune the models on public-domain…
Lead Author: Xi Ye TL;DR: We propose RnG-KBQA, a Rank-and-Generate Approach for Question Answering over Knowledge Bases, which enables answering natural language questions over large-scale knowledge bases. Our approach is capable of answering…
TL;DR: We propose controllable counterfactuals (CoCo) to evaluate dialogue state tracking (DST) models on novel scenarios, which results in significant performance drop of up to 30.8% for state-of-the-art DST models. Using CoCo for…






 
	 
	 
		