AI Research

Breaking the Memory Trilemma: How to Build AI Agents That Actually Remember

Egor Pakhomov

Erik Nijkamp

2 additional authors

October 28, 2025 5 min read

Imagine an AI assistant that forgets your project requirements between Monday and Wednesday, or one that takes 30 seconds to recall a simple preference you mentioned yesterday. This is the reality of AI memory systems today, and it’s holding back the promise of truly intelligent enterprise agents with long-term semantic memory.

At Salesforce AI Research, we’ve been tackling a fundamental challenge that every organization faces as they deploy AI agents: how do you give these systems reliable, practical memory without breaking the bank or frustrating users with glacial response times? Our recent research reveals both a surprising paradox and a promising solution that could transform how enterprise AI systems learn and adapt.

Why memory is the missing piece of enterprise AI

For AI agents to evolve from sophisticated tools into genuine partners, they need memory they can trust. Not just any memory — but the kind that allows them to absorb the unique nuances of your business, learn specific workflows, personalize assistance for individual team members, and learn from corrections so they don’t repeat mistakes.

Without robust memory, an AI agent is like a brilliant consultant with amnesia. Every interaction starts from scratch. Every correction needs repeating. Every preference must be restated. The agent’s capacity to provide increasing value over time hits a hard ceiling.

This limitation becomes especially acute in enterprise settings, which we’re calling the path toward Enterprise General Intelligence (EGI) — AI systems that don’t just answer questions but truly understand and adapt to your organization’s unique context.

The memory trilemma: Pick two, sacrifice one

Here’s where things get interesting — and frustrating. Through extensive benchmarking of 75,000-plus test cases, we’ve identified what we call the “Memory Trilemma.” Like the famous project management triangle (fast, good, cheap — pick two), AI memory systems force you to balance three competing factors:

Accuracy: How well can the AI recall the correct, relevant information? High accuracy means remembering that specific API endpoint you mentioned three weeks ago. Low accuracy means generic responses because the system lacks context.

Cost: The computational and financial resources required. With large language models charging by the token, feeding extensive conversation history gets expensive fast. At 300 past conversations, costs can reach 8 cents per response—seemingly small until you multiply by thousands of daily interactions.

Latency: The time between question and answer. Users expect near-instant responses, but processing extensive memory can take 30+ seconds, making the interaction feel more like waiting for a database query than having a conversation.

The surprising power of simplicity (at first)

Our research uncovered something unexpected: For the first 30-150 conversations, the “dumbest” approach works best. Simply feeding all previous conversations into the model’s context window achieves 70-82% accuracy on memory-dependent questions. Compare that to sophisticated retrieval systems like Mem0 or Zep, which only achieve 30-45% accuracy despite their complex indexing and graph structures.

Why? It turns out that conversational memory has a unique characteristic that differentiates it from other AI challenges. Unlike web search or document retrieval that start with billions of tokens, memory begins at zero. Even an hour of daily conversation over four weeks generates only 100,000 tokens — well within modern context windows.

This means that for most users’ initial interactions with an AI agent, the sophisticated retrieval mechanisms that power web search are actually overkill. It’s like using a satellite navigation system to find your way around your own living room.

What’s your agentic AI strategy?

Our playbook is your free guide to becoming an agentic enterprise. Learn about use cases, deployment, and AI skills, and download interactive worksheets for your team.

The future starts now

When simple stops scaling

But here’s where the trilemma bites. As conversation history grows:

At 30 conversations: Long context costs about $0.01 per response with 10-second latency
At 150 conversations: Costs jump to $0.04 with 20-second waits
At 300 conversations: You’re paying $0.08 and waiting 30+ seconds

For an enterprise with thousands of employees, each generating multiple interactions daily, these numbers quickly become untenable. A single employee having 10 interactions per day would cost $24/month just in memory processing at the 300-conversation mark—before considering the actual work the AI performs.

Meanwhile, switching to efficient retrieval systems crashes your accuracy from 70% down to 30%. For enterprise applications where a single mistake could mean missed deadlines or incorrect analyses, this accuracy penalty is often unacceptable.

The hybrid solution: Best of both worlds

This is where our proposed hybrid approach comes in. Instead of choosing between expensive accuracy and cheap mediocrity, Salesforce AI Research has developed a block-based extraction method that maintains the accuracy of long context while dramatically reducing costs.

The approach works in two phases:

Parallel extraction: Break conversation history into manageable chunks and extract relevant memories from each in parallel
Smart aggregation: Combine these extracted memories into a concise context for the final response

The results are compelling:

Token usage: Reduced from 27,000 tokens to 2,000 tokens at 300 conversations—a 13x improvement
Accuracy: Maintains 70-75% accuracy, nearly matching pure long context
Latency: Parallel processing eliminates the sequential bottleneck
Cost: Approaches the efficiency of pure retrieval systems

Practical implementation strategies

Based on our findings, here’s how organizations should think about implementing memory for their AI agents:

Start Simple (0-30 conversations): Use long context for new users and initial interactions. The performance is unbeatable and costs remain reasonable.

Transition Thoughtfully (30-150 conversations): Begin incorporating block-based extraction for frequent users. Monitor cost-accuracy tradeoffs based on your specific use case value.

Scale Smartly (150+ conversations): Deploy full hybrid architecture. Consider pure retrieval only for low-stakes applications where occasional errors are acceptable.

Choose Models Wisely: Our research shows that medium-tier models (like GPT-4o or Claude Sonnet) provide equivalent memory performance to premium models at 8x lower cost. Save the expensive models for tasks that actually need them.

The path forward for enterprise AI

The memory trilemma isn’t just an academic curiosity — it’s the barrier between current AI tools and the promise of true Enterprise General Intelligence. By understanding these tradeoffs and implementing hybrid approaches, organizations can build AI agents that genuinely learn and adapt over time.

The key insight is that memory isn’t a one-size-fits-all problem. The assistant helping a new employee needs different memory architecture than one supporting a power user with months of interaction history. By matching the solution to the scale, we can provide every user with an AI partner that remembers what matters, responds quickly, and doesn’t break the budget.

As we continue developing these systems at Salesforce, we’re seeing that solving the memory trilemma isn’t just about technical optimization—it’s about enabling AI agents to become true partners in enterprise work. When an AI system can remember your preferences, learn from corrections, and build on past conversations, it transforms from a tool you use into a colleague you collaborate with.

The future of enterprise AI isn’t just about making models bigger or faster. It’s about making them remember — practically, affordably, and reliably. With hybrid memory architectures, we’re finally breaking free from the trilemma’s constraints and moving toward AI agents that truly understand and grow with your business.

Are you ready to succeed with AI?

Discover your organization’s AI readiness and learn strategic opportunities for adoption. Gain insights on human-AI collaboration, to help you achieve greater customer success.

Take the assessment

Beyond the Chat Window: How Computer Use Agents Are Learning to Click, Scroll, and Work

4 min read

BFCL Audio: A Benchmark for Audio-Native Function Calling

9 min read

Egor Pakhomov Principal Applied Scientist, AI Research

Egor Pakhomov is a Principal Applied Scientist at Salesforce AI Research. He specializes in data acquisition, preparation, and experimentation for training our foundational models. Prior to joining Salesforce, he worked on big‑data projects at OpenAI, Airbnb, and Apple.

More by Egor

Erik Nijkamp

Erik Nijkamp is a Research Scientist at Salesforce AI Research. His research emphasis is on large-scale generative models and representation learning with applications in NLP and computer vision. Prior to Salesforce, he was a PhD student under Prof. Song-Chun Zhu and Prof. Ying Nian Wu at UCLA.

More by Erik

Silvio Savarese Executive Vice President and Chief Scientist, Salesforce AI Research

Silvio Savarese is the Executive Vice President and Chief Scientist of Salesforce AI Research, as well as an Adjunct Faculty of Computer Science at Stanford University, where he served as an Associate Professor with tenure until winter 2021. At Salesforce, he shape the scientific direction and Read More

More by Silvio

Caiming Xiong SVP Salesforce Research

More by Caiming

Breaking the Memory Trilemma: How to Build AI Agents That Actually Remember

Egor Pakhomov

Erik Nijkamp

2 additional authors

Why memory is the missing piece of enterprise AI

The memory trilemma: Pick two, sacrifice one

The surprising power of simplicity (at first)

What’s your agentic AI strategy?

When simple stops scaling

The hybrid solution: Best of both worlds

Practical implementation strategies

The path forward for enterprise AI

Are you ready to succeed with AI?

Just For You

Beyond the Chat Window: How Computer Use Agents Are Learning to Click, Scroll, and Work

BFCL Audio: A Benchmark for Audio-Native Function Calling

Just For You

MCP-Universe: A Comprehensive Framework for AI Agent Development and Benchmarking

Why You Shouldn’t Be Scared of Digital Labor For Your Startup or SMB

Introducing Moirai 2.0

The New AI Agent Training Ground: Simulating Enterprise Environments

Powering Intelligent Travel Support with Salesforce and Agentic AI

Why Generic LLM Agents Fall Short in Enterprise Environments

Taming the ‘Agentic Wild West’: How AI Protocols Will Expand Enterprise Boundaries

From Flow Generalists to Champions: Building Agentic AI for Salesforce Automation

Share article

Why memory is the missing piece of enterprise AI

The memory trilemma: Pick two, sacrifice one

The surprising power of simplicity (at first)

What’s your agentic AI strategy?

When simple stops scaling

The hybrid solution: Best of both worlds

Practical implementation strategies

The path forward for enterprise AI

Are you ready to succeed with AI?

Share article

Explore related content by topic

Get the latest articles in your inbox.

360 Highlights

IT

Commerce

Marketing

Service

Sales

Thanks, you're subscribed!