The "Intern" in the Machine: Why LLMs Need a Script to Scale

Large language models are amazing, but they can’t do everything by themselves. This truth became clear as we prepared for our internal Company Kickoff. Salesforce had built an agent to help our thousands of sellers learn new product lines — a task that, on paper, was a perfect assignment for an LLM. We could feed it all of our documentation and knowledge, and it could generate a bank of questions and score them with nuanced rubrics.

But as the LLM interacted with our team, it began to drift — the phenomenon where an agent’s behavior shifts away from its original purpose as it encounters new data or human inputs. One day, the quiz would be a perfect assessment of core value propositions; the next, after encountering a new prompt or piece of documentation, it would wander into a cul-de-sac. It might fixate on a minor technical detail buried in a 50-page white paper — interesting to a developer, but practically useless for a seller who needs to solve a customer’s immediate business problem.

This is a familiar challenge for companies that are unleashing agents across their business. After all, human sellers can tell when information is irrelevant — but autonomous AI needs to take meaningful action without human oversight. This kind of drift in production could result in bad service, lost customers, and much worse. For AI to truly serve the enterprise, we had to solve a fundamental tension: how to harness the “what-if” of the LLM without losing the “must-do” of business operations.

We’ve met this challenge with a hybrid engine — combining the creative horsepower of probabilistic LLMs with the precision steering of deterministic workflows. The result is flexible innovation with enterprise-grade reliability. Through capabilities like Agent Graph — which provides a structural map to keep agents on track — and Agent Script — which adds programmatic controls to agent behavior — we are giving agents the freedom to think within controlled boundaries. We’ve realized that in the enterprise, the most powerful form of intelligence is not one that can do anything, but one that knows exactly what it must do.

This may sound like a step backward — a return to hard-coding and a retreat from LLMs’ promise. But determinism makes agents even more powerful by ensuring that every spark of AI reasoning is tethered to the bedrock of absolute business truth. For instance, a service agent can be instructed to determine a customer’s membership status and order history, and to adjust its actions accordingly while still giving it the freedom to interpret requests and craft responses.

Navigating the “Intern” Phase

The biggest mistake businesses make is thinking of agents as traditional software. Even if they understand that agents are inherently probabilistic, they still use the same development process that they use for deterministic code. They spend months building and refining agents, testing them with canned data until they seem to work flawlessly, and then deploy them with the expectation that their work is done.

But launching an agent isn’t the end of the story — it’s just the beginning. Unlike traditional software, which works the same way every time, agents are probabilistic. By definition, that means they can occasionally come up with the wrong answer, or, as we saw at our Company Kickoff, they can drift. And their end users — people — are also non-deterministic. There is no way to predict every possible interaction they might have with an agent. So keeping agents on track requires an ongoing process of observation, monitoring, fine-tuning, and updating. It also requires knowing when to hard-code explicit instructions and when to let the agent’s intelligence take over.

As with humans, that mix will change over time. If you hire a college intern, you don’t just point them toward a library and say, “Help our customers.” You give them a script, a rubric, and a set of guardrails. You expect them to follow the rules (the deterministic part) while using their human judgment to handle the nuances of a conversation (the probabilistic part). As they gain experience and prove their reliability, you loosen the script — ratcheting back the determinism, and dialing up the autonomy.

Enterprise AI is currently in its “intern” phase. Adding deterministic controls to agentic workflows ensures agents have the basic instructions to behave appropriately. By using Agent Script, we aren’t stifling the AI; we are giving it the clear guidance necessary to deliver repeatable, high-quality outcomes. Eventually, as these systems “graduate” and we gain more data on their performance, the scripts will become more flexible. But today, the script is what allows the agent to be hired in the first place.

The ROI of Determinism

This hybrid approach to developing agents lets businesses move faster than either a purely probabilistic or deterministic approach. We call this agentic life cycle — a shift from traditional software development to a “test and tune” model. Instead of waiting for agents to achieve a zero-percent error rate, we run multivariate experiments in production — comparing different scripts and data sets—to see what drives the best outcomes. (Of course, deterministic guardrails ensure that any imperfections fall within an acceptable range.) This approach is 6 to 10 times faster than traditional coding because you are coaching a workforce rather than just debugging a program.

Our experience with our internal service agent perfectly illustrated the necessity of a “test and tune” approach in a real-world environment. Initially, we deployed the agent with a broad mandate to be helpful, only to find it was so unconstrained that it actually started recommending competitor software to our own customers. In an attempt to fix this, we pivoted to a rigid, deterministic rule: “Never mention another company’s product.” However, this created a new form of friction; the agent became useless to customers who simply wanted to know how to integrate Salesforce with the existing tools they already owned.

Instead of retreating back to a lab for months of recoding, we tuned the agent’s reasoning in real-time, moving away from binary “if-then” rules to a more sophisticated instruction: “Act in the best interest of the customer while maintaining your identity as a Salesforce employee.” This breakthrough only happened because we were observing these “people drifts” in production. By testing this nuance in the wild, we quickly found the “Goldilocks zone” where the agent could provide helpful integration advice without acting as a lead generator for the competition.

This validated our core philosophy: you don’t find the perfect balance through theory. You find it by accelerating into production and coaching the agent through the nuances of the last mile.

The Future Is Hybrid

This move toward hybrid reasoning—pairing the “horsepower” of the LLM with the “steering” of deterministic workflows — is how we bring AI out of the lab and into the boardroom. We are moving toward a future where agents don’t just “chat,” they work. They will graduate from following scripted controls to managing complex, multi-agent orchestrations across the enterprise.

But for now, the script is your best friend. It’s how you move from experimentation to production, and from “cool technology” to “measurable ROI.”

That’s also how we fixed our quiz agent. Once we added the right controls, it worked beautifully. By launching and tweaking, we were able to figure out the right blend of probabilistic intelligence and deterministic guardrails — and get thousands of sellers up to speed faster than ever before.

For companies on the road to becoming Agentic Enterprises, my advice is to start small, document your workflows, and over time learn how to give your AI the “script” it needs to succeed. You’ll find that the more control you exert today, the more autonomy you’ll be able to grant tomorrow.