As the development and deployment of large language models (LLMs) accelerates, evaluating model outputs has become increasingly important. The established method of evaluating responses typically involves recruiting and training human evaluators, having them…
Co-authored by Hannah Cha, Orlando Lugo, and Sarah Tan At Salesforce, our Responsible AI & Technology team employs red teaming practices to improve the safety of our AI products by testing for malicious…