In the world of AI agents that click, scroll, execute and automate — we’re moving fast from “just understand text” to “actually use software for you.” The new benchmark SCUBA tackles exactly that:…
Imagine an AI assistant that forgets your project requirements between Monday and Wednesday, or one that takes 30 seconds to recall a simple preference you mentioned yesterday. This is the reality of AI…
What Is Deep Research? Deep Research ≠ Deep Search. You may have come across “Deep Search” features in tools like ChatGPT or Claude — designed to enhance retrieval and concise answers. While Deep…
Large language model (LLM)-based software engineering (SWE-) agents have recently demonstrated remarkable progress on realistic software engineering tasks such as code review, bug fixing, and repository-level reasoning. Most SWE-agents start from a fresh…
Salesforce AI Research announces framework to optimize agent capability and consistency through synthetic data, realistic testing, and reinforcement learning. Even as AI models grow more sophisticated, a curious challenge persists: systems that solve…
Recently my daughter asked a seemingly simple question over dinner: “Dad, which is bigger, Australia or Europe?” As any parent today knows, these moments present a choice — attempt an answer from memory…
Main Takeaways Background LLM agents are seeing more and more applications in real life, from being personal assistants to helping software engineers write code and even working side by side with scientists on…
The recent launch of Agentforce marks a pivotal moment in orienting Salesforce and our customers’ businesses toward an AI-empowered future. In this emerging landscape, augmented by a network of AI agents, the role…