Skip to Content
Skip to Footer

AI Balancing Act: How Companies Can Scale LLMS to Improve Performance, Cut Costs, and Help the Environment

A media feeding frenzy over the exploding scale and increasing costs of artificial intelligence (AI) has created a divide between the reality of the technology and its perception among decision makers, explained Silvio Savarese, Chief Scientist of Salesforce AI Research in a recent interview.

Whether these concerns come from worries about the bottom line, the impact of AI on the environment, or even basic questions of fairness and access, Savarese believes that changing these misconceptions requires an understanding of when scale is necessary to deliver high-quality AI outputs — and when it isn’t.

In the interview, Savarese also shared why the environmental and financial price tag of AI doesn’t have to be as head-spinning as the headlines might suggest, and how understanding the different scales and performance of AI models can help any business responsibly harness this transformative technology to boost productivity, build deeper relationships with customers, and enhance daily workflows and processes.

Q. Today’s well-known Large Language Models (LLMs) are getting a lot of negative attention for the compute power it requires to run them — both in terms of cost to operate as well as environmental impact. Are models this large necessary for businesses to tap into the power of generative AI? 

Rather than asking if the scale of today’s LLMs is necessary, let’s ask what it’s necessary for.

The scale and size of an AI deployment isn’t inherently advantageous. Rather, when implementing AI, there’s a range of possibilities and trade-offs that should be explored. Remember, the ChatGPTs of the world are designed to do more or less everything, and that makes them very different from most enterprise applications. They can help with homework, suggest holiday recipes, and even reimagine La bohème’s libretto as Socratic dialogues. It’s a great party trick‌ — ‌albeit an expensive one‌. Training Open AI’s ChatGPT 4 cost more than $100 million. ‌But that isn’t what enterprises are using AI for.

The scale and size of an AI deployment isn’t inherently advantageous. Rather, when implementing AI, there’s a range of possibilities and trade-offs that should be explored.”

Silvio Savarese, Chief Scientist of Salesforce AI Research

There’s also the environmental impact of these large LLMs. The hypothetical long-term benefits of AI in combating climate change in areas such as monitoring emissions and optimizing the transportation of goods are significant, with the potential to reduce global emissions 5 to 10% by 2030. However, the utilization of LLMs, while groundbreaking in their capabilities, requires enormous computing resources, exacerbating pressing concerns such as the release of greenhouse gasses, the depletion of water resources, and the extraction of raw materials along the supply chain. Given the urgency of the climate crisis and the imperative to combat planet-warming emissions, it’s paramount that the development and implementation of AI technologies doesn’t surpass the capacity of our planet’s resources.

In contrast to LLMs like ChatGPT or Anthropic’s Claude, an AI model like our own CodeGen 2.5, has a limited set of tasks — ‌helping developers write, understand, and debug code faster. Despite its deliberately small scale, its performance is on a par with models literally twice its size, boasting remarkable efficiency without compromising on utility. So even as it helps developers work faster, it also reduces costs, latency, and, crucially, environmental impact compared to larger-scale LLMs. 

Businesses should not be asking whether they need scale, but how they want to apply that scale. Depending on the task, the answer may vary wildly‌ — ‌and bigger is most certainly not always better.

Q. Okay, but large models still outperform smaller ones, right?

Believe it or not, even this isn’t a clear-cut answer. Large models do generally outperform their smaller counterparts when it comes to flexibility. But therein lies the nuance that is so often left out of conversations around LLMs: as tasks become more narrow, more well-defined, and more unique to a specific organization or domain‌ — ‌exactly what enterprise AI is all about‌ — ‌it’s possible to do more with less. 

In other words, most models aren’t meant to be everything to everyone, which frees up enterprises to focus on their needs while saving huge amounts of resources in the process.

Q. Are you saying small models can’t just keep up with larger ones, but actually outperform them?

Not all the time, no. But under the right circumstances, small models really can offer the best of all worlds: reduced cost, lower environmental impact, and improved performance. Small models are often neck-and-neck with large ones when it comes to tasks like knowledge retrieval, technical support, and answering customer questions. 

Small models are often neck-and-neck with large ones when it comes to tasks like knowledge retrieval, technical support, and answering customer questions.”

Silvio Savarese, Chief Scientist of Salesforce AI Research

In fact, with the right strategy, they can even perform better. This includes models from the open-source world, including Salesforce’s own XGen 7B‌. Our model is specifically trained on a sequence of data with suitable length, helping it with tasks like the summarization of large volumes of text and even writing code‌ — ‌and it consistently exceeds the performance of larger models by leveraging better grounding strategies and better embeddings. Additional small-scale models from our AI research org are planned to be released soon and will be powering generative AI capabilities for critical customer use cases.

Q. Lowering costs is great, but transparency is just as vital. Scale doesn’t matter if I can’t trust the output, right?

Scaling down models isn’t just about saving money. It’s one of the best ways to ensure AI outputs are reliable. Large models are exciting, but they often don’t provide much information about the data they use. This leaves companies with no choice but to monitor deployments closely to catch harmful or inaccurate outputs. Needless to say, this falls far short of the standard most businesses expect from their technology.

Scaling down models isn’t just about saving money. It’s one of the best ways to ensure AI outputs are reliable.”

Silvio Savarese, Chief Scientist of Salesforce AI Research

Instead, consider a simple, intuitive fact: smaller models are trained on smaller data sets, which are inherently easier to document and understand‌ — ‌an increasingly important trust and transparency measure as the role of LLMs grows to include mission-critical applications that don’t just require reliability, but accountability as well. 

Additional steps for verifying that generative AI produces trusted results are of course, critical: the Einstein Trust Layer is Salesforce’s guaranteed accountability model assisting businesses in efficiently managing data privacy, security, and transparency. The Einstein Trust Layer serves as a secure middleman for user interactions with LLMs. Its functions include obscuring personally identifiable information (PII), monitoring output for harmful content, guaranteeing data privacy, prohibiting the storage or use of user data for future training, and unifying discrepancies among various model providers. 

Q. What if companies really do need more scale?

There are, of course, times when increasing scale is simply unavoidable, and the power of small models doesn’t negate the potential of bigger ones. But again, let’s ask the right questions: rather than simply asking whether we need scale, let’s ask what you need it for. The answer will inform your strategy from the very first steps, because there are, ultimately, two very different ways to scale: increasing the parameter count of a single model, or orchestration‌ — ‌the connection of multiple models into a single, larger deployment, analogous to multiple human workers coming together as a team.

Orchestration has the potential to offer the power of scale while still keeping its pitfalls in check. After all, even small models can do amazing things when combined with one another, especially when each is geared toward a specific strength that the others might lack: one model to focus on information retrieval, one to focus on user interactions, another to focus on the generation of content and reports, and so on. In fact, smaller models are arguably a more natural choice in such cases, as their specialized focus makes their role in the larger whole easier to define and validate. 

In other words, small models can be combined to solve ever-bigger problems, all while retaining the virtues of their small size‌ — ‌each can still be cleanly trained, tuned, and understood with an ease large models can’t touch. And it’s yet another example of why a simple parameter count can often be misleading.

Q. How can businesses best incorporate LLMs?

LLMs are a hugely complex topic, and there’s room for any number of voices in the conversation. But we’re overdue for a more balanced, strategic perspective on the question of how much we need to get what we want: how much time, how much compute, and, ultimately, how much cost. The answer isn’t anywhere near as simple as the impression one might get from the headlines, and I believe amazing things can be done on just about any budget. It’s just a matter of knowing what’s possible.

Go deeper:

Astro

Get the latest Salesforce News