Skip to Content
Skip to Footer
0%

Not All Agentic Harnesses Are Created Equal

Not All Agentic Harnesses Are Created Equal

 It pays to know which tasks best fit which kinds of harnesses — and why it matters when they’re used at enterprise scale

A new term has entered the AI lexicon: the agentic harness. It’s the scaffolding around a model that gives the AI access to the tools, data, and other elements that render it useful. 

Ethan Mollick, the Wharton business professor whose hands-on research on AI adoption has made him a leading voice in the field, describes the harness as what enables an AI agent to “take actions and complete multi-step tasks on its own.”

If the AI foundational model is the engine, the harness is everything else: the chassis, the wheels, the drive shaft, the brakes. Increasingly, it’s the harness and not the model that determines what actually gets done.

That shift is already visible in a new class of products. Early examples of harnesses in practice include Anthropic’s Claude Code, which can generate, run, and refine code. Another is OpenClaw, an always-on agent that operates across applications with memory and persistence. Such products are defining a competitive environment that’s moving beyond offering the most capable model to building the most effective harness around it.

Not all harnesses are created equal, though, and the gap among them is larger than the current conversation acknowledges. What’s more, those products, for all their promise, were built for use by individuals. 

Acting on behalf of an entire organization is a far more complex challenge. An enterprise agent needs to grasp more than what one employee wants. It must understand what the organization has collectively decided: the shared data and work history that give actions meaning and the policies and trade-offs that determine whether the agent has standing to act at all.

Deploy agents across an enterprise without that foundation and the problems quickly compound. Individual agents optimize for their own domain and none coordinate across the whole organization. Chaos ensues.   

Where the real work lives

Organizations don’t act through a single mind. They operate with competing priorities, fragmented data, and decisions that require human authority, institutional memory, and hard-won consensus. 

Most of the harnesses we see today are built for environments where the work is self-contained and the finish line for tasks is clear. Coding is one obvious example. Others include booking travel, submitting and processing an expense report, and fielding customer inquiries across languages and systems. In these examples, the agent can handle a task from start to finish. The output is verifiable.

But most work inside an enterprise is nowhere near that tidy. Consider what it takes to complete a complex procurement decision, handle a major contract negotiation, or run payroll. Each of these tasks touches multiple departments, relies on shared data and institutional history, and requires the negotiation of competing goals inevitable in any high-stakes decision. 

Without clearly defined priorities and encoded lines of authority, an agent can cross departments, trigger handoffs, and touch a dozen systems yet create more work than it completes. An agent without a robust-enough harness might optimize locally but create chaos collectively. 

At Salesforce Futures, we’ve been asking what separates the harnesses that deliver from those that disappoint. The answer starts with agentic-loop reliability. Can an agentic system reliably complete a task from start to finish without a human overseeing each step? 

For a single user working on a discrete task, the loop is hard enough. Layer on the realities of an organization — conflicting agendas, distributed authority, data spread across dozens of systems — and the same loop becomes an order of magnitude harder to close reliably. 

The risk for more complex tasks is that a loop can break down at any of five stages:

  • Specification: Did the agent correctly understand the goal?
  • Planning: Can the agent reason and plan to achieve the goal?
  • Execution: Can it actually take the required action?
  • Verification: Can it determine whether it succeeded?
  • Termination: Can it know when to stop?

The reliability of a loop depends on a series of factors, including the quality of data the harness can access, the permissions it has been granted, the tools available to it, and the clarity of the policies governing its actions. A gap in any one of them can derail the whole sequence. 

And even a loop that closes perfectly may not be sufficient. Some work finishes when the output is produced. But other work — closing a sale, resolving a dispute, winning an approval — requires what might be called “social closure.” These tasks require humans in the loop. They need people to decide when they are completed, through persuasion, trust, and human judgment. No agent can substitute for that.

 One mind or many?

What drives that complexity above all else is agency. Is a harness acting primarily on behalf of an individual, or does it need to enlist a group of people to finish the job? 

A harness like Claude Code or OpenClaw operates in service of a single user with that person’s own set of goals. There’s one chain of authority, which is precisely why those tools feel fluid and fast. The work is bounded and success measurable. For an organization, however, the outcome is almost never that clean. 

To act coherently on behalf of an organization, a harness needs two things that are harder to engineer. The first is shared context: the data, records, and work history required to take any meaningful action inside that particular organization. 

The second is collective intent: a clear picture of the organization’s priorities and hierarchies so that the agent knows not only what to do but also whether it has authority to do it. Without shared context, the agent acts on incomplete information. Without collective intent, it has no way to choose when legitimate goals conflict.

The harder problem

For now, the harnesses generating the most buzz are focused on personal augmentation — one person expressing intent, accessing tools, getting work done. Those products feel like magic, because for discrete  tasks, they are. The loop closes, the output is verifiable, and done actually means done. These harnesses are also palpably empowering to the people using them. 

The harder, less glamorous work is building harnesses for the complex, cross-functional processes that define how organizations actually operate. This work can seem invisible to the employees it’s meant to serve. And because organizational work requires layers of human judgment to complete it, these harnesses not only are harder to build but also seem considerably less magical. 

The risk is a self-reinforcing dynamic whereby enterprises grow frustrated with the complexity of organizational AI and instead settle for the tools that feel good in individual hands. The hope is that such tools add up to something greater than the sum of their parts. At the margins, they might. 

But a harness built for one isn’t the same as one built to act coherently on behalf of thousands of people with competing goals and constraints. That difference runs deeper than features or price. It goes all the way down to the architecture.

The AI foundational models being built today can do extraordinary things. Yet for most businesses, the results remain underwhelming. That’s not primarily a model problem. It’s a harness problem.

Astro

Get the latest Salesforce News