Imagine an AI assistant that forgets your project requirements between Monday and Wednesday, or one that takes 30 seconds to recall a simple preference you mentioned yesterday. This is the reality of AI memory systems today, and it’s holding back the promise of truly intelligent enterprise agents with long-term semantic memory.
At Salesforce AI Research, we’ve been tackling a fundamental challenge that every organization faces as they deploy AI agents: how do you give these systems reliable, practical memory without breaking the bank or frustrating users with glacial response times? Our recent research reveals both a surprising paradox and a promising solution that could transform how enterprise AI systems learn and adapt.
Why memory is the missing piece of enterprise AI
For AI agents to evolve from sophisticated tools into genuine partners, they need memory they can trust. Not just any memory — but the kind that allows them to absorb the unique nuances of your business, learn specific workflows, personalize assistance for individual team members, and learn from corrections so they don’t repeat mistakes.
Without robust memory, an AI agent is like a brilliant consultant with amnesia. Every interaction starts from scratch. Every correction needs repeating. Every preference must be restated. The agent’s capacity to provide increasing value over time hits a hard ceiling.
This limitation becomes especially acute in enterprise settings, which we’re calling the path toward Enterprise General Intelligence (EGI) — AI systems that don’t just answer questions but truly understand and adapt to your organization’s unique context.
The memory trilemma: Pick two, sacrifice one
Here’s where things get interesting — and frustrating. Through extensive benchmarking of 75,000-plus test cases, we’ve identified what we call the “Memory Trilemma.” Like the famous project management triangle (fast, good, cheap — pick two), AI memory systems force you to balance three competing factors:

Accuracy: How well can the AI recall the correct, relevant information? High accuracy means remembering that specific API endpoint you mentioned three weeks ago. Low accuracy means generic responses because the system lacks context.
Cost: The computational and financial resources required. With large language models charging by the token, feeding extensive conversation history gets expensive fast. At 300 past conversations, costs can reach 8 cents per response—seemingly small until you multiply by thousands of daily interactions.
Latency: The time between question and answer. Users expect near-instant responses, but processing extensive memory can take 30+ seconds, making the interaction feel more like waiting for a database query than having a conversation.
The surprising power of simplicity (at first)
Our research uncovered something unexpected: For the first 30-150 conversations, the “dumbest” approach works best. Simply feeding all previous conversations into the model’s context window achieves 70-82% accuracy on memory-dependent questions. Compare that to sophisticated retrieval systems like Mem0 or Zep, which only achieve 30-45% accuracy despite their complex indexing and graph structures.
Why? It turns out that conversational memory has a unique characteristic that differentiates it from other AI challenges. Unlike web search or document retrieval that start with billions of tokens, memory begins at zero. Even an hour of daily conversation over four weeks generates only 100,000 tokens — well within modern context windows.
This means that for most users’ initial interactions with an AI agent, the sophisticated retrieval mechanisms that power web search are actually overkill. It’s like using a satellite navigation system to find your way around your own living room.
What’s your agentic AI strategy?
Our playbook is your free guide to becoming an agentic enterprise. Learn about use cases, deployment, and AI skills, and download interactive worksheets for your team.

When simple stops scaling
But here’s where the trilemma bites. As conversation history grows:
- At 30 conversations: Long context costs about $0.01 per response with 10-second latency
- At 150 conversations: Costs jump to $0.04 with 20-second waits
- At 300 conversations: You’re paying $0.08 and waiting 30+ seconds
For an enterprise with thousands of employees, each generating multiple interactions daily, these numbers quickly become untenable. A single employee having 10 interactions per day would cost $24/month just in memory processing at the 300-conversation mark—before considering the actual work the AI performs.
Meanwhile, switching to efficient retrieval systems crashes your accuracy from 70% down to 30%. For enterprise applications where a single mistake could mean missed deadlines or incorrect analyses, this accuracy penalty is often unacceptable.
The hybrid solution: Best of both worlds
This is where our proposed hybrid approach comes in. Instead of choosing between expensive accuracy and cheap mediocrity, Salesforce AI Research has developed a block-based extraction method that maintains the accuracy of long context while dramatically reducing costs.
The approach works in two phases:
- Parallel extraction: Break conversation history into manageable chunks and extract relevant memories from each in parallel
- Smart aggregation: Combine these extracted memories into a concise context for the final response
The results are compelling:
- Token usage: Reduced from 27,000 tokens to 2,000 tokens at 300 conversations—a 13x improvement
- Accuracy: Maintains 70-75% accuracy, nearly matching pure long context
- Latency: Parallel processing eliminates the sequential bottleneck
- Cost: Approaches the efficiency of pure retrieval systems
Practical implementation strategies
Based on our findings, here’s how organizations should think about implementing memory for their AI agents:
Start Simple (0-30 conversations): Use long context for new users and initial interactions. The performance is unbeatable and costs remain reasonable.
Transition Thoughtfully (30-150 conversations): Begin incorporating block-based extraction for frequent users. Monitor cost-accuracy tradeoffs based on your specific use case value.
Scale Smartly (150+ conversations): Deploy full hybrid architecture. Consider pure retrieval only for low-stakes applications where occasional errors are acceptable.
Choose Models Wisely: Our research shows that medium-tier models (like GPT-4o or Claude Sonnet) provide equivalent memory performance to premium models at 8x lower cost. Save the expensive models for tasks that actually need them.
The path forward for enterprise AI
The memory trilemma isn’t just an academic curiosity — it’s the barrier between current AI tools and the promise of true Enterprise General Intelligence. By understanding these tradeoffs and implementing hybrid approaches, organizations can build AI agents that genuinely learn and adapt over time.
The key insight is that memory isn’t a one-size-fits-all problem. The assistant helping a new employee needs different memory architecture than one supporting a power user with months of interaction history. By matching the solution to the scale, we can provide every user with an AI partner that remembers what matters, responds quickly, and doesn’t break the budget.
As we continue developing these systems at Salesforce, we’re seeing that solving the memory trilemma isn’t just about technical optimization—it’s about enabling AI agents to become true partners in enterprise work. When an AI system can remember your preferences, learn from corrections, and build on past conversations, it transforms from a tool you use into a colleague you collaborate with.
The future of enterprise AI isn’t just about making models bigger or faster. It’s about making them remember — practically, affordably, and reliably. With hybrid memory architectures, we’re finally breaking free from the trilemma’s constraints and moving toward AI agents that truly understand and grow with your business.
Are you ready to succeed with AI?
Discover your organization’s AI readiness and learn strategic opportunities for adoption. Gain insights on human-AI collaboration, to help you achieve greater customer success.
















