Simply put, AI Assistants are built to be personalized, while AI Agents are built to be shared (and scaled)—and both techniques promise extraordinary opportunities across the enterprise.
LLM benchmarks evaluate how accurately a generative AI model performs, but most benchmarks overlook the kinds of real-world tasks an LLM would perform in an enterprise setting.
Time series forecasting is becoming increasingly important across various domains, thus having high-quality, diverse benchmarks are crucial for fair evaluation across model families.
As the development and deployment of large language models (LLMs) accelerates, evaluating model outputs has become increasingly important. The established method of evaluating responses typically involves recruiting and training human evaluators, having them…
Co-authored by Hannah Cha, Orlando Lugo, and Sarah Tan At Salesforce, our Responsible AI & Technology team employs red teaming practices to improve the safety of our AI products by testing for malicious…
Retrieval Augmented Generation (RAG) has not only gained steam as one of the most invested areas of research in generative AI but also gathered considerable popularity and commercialization opportunities. RAG is typically applied…