Skip to Content
0%

MoiraiAgent: An Agentic Framework for Context-Aware Time-Series Forecasting

Introducing MoiraiAgent

Accurate forecasting is the backbone of strategic decision-making in everything from global finance to climate science. In an enterprise setting, it powers the early warning systems for telemetry, optimizes inventory through sales projections, and ensures operational efficiency via workforce capacity planning.

Traditional time-series models typically rely on historical numerical data, identifying patterns like trends and seasonality to predict the future. While effective in stable environments, this “numbers-only” approach often falters in the real world. Modern outcomes are rarely driven by past data alone; they are shaped by external shocks, policy shifts, equipment failures, and qualitative insights from news or reports. To stay accurate, forecasting must look beyond the spreadsheet and account for the unpredictable forces that truly drive change.

Figure 1: The daily electricity demands of a city are influenced by a variety of factors. Besides the patterns in historical data, reasonable predictions need to take into account factors like weather, holidays, regulations, etc.

To bridge this gap, we present MoiraiAgent, an automated framework for time-series forecasting. Unlike traditional models that rely solely on numerical inputs, MoiraiAgent integrates a broad range of information sources—including historical values and rich contextual signals—to produce more reliable predictions. It continuously reasons over new information as it emerges, updating its forecasts in real time and enabling more adaptive, robust performance in dynamic environments.

At its core, MoiraiAgent leverages powerful large language models (LLMs) and sophisticated data-analystics tools to understand, integrate, and reason over heterogeneous data. This makes it particularly valuable for scenario-planning use cases, where context and change matter as much as historical trends. By moving beyond conventional forecasting boundaries, MoiraiAgent delivers predictions that are not only data-driven, but also context-aware and responsive to real-world conditions.

Figure 2: MoiraiAgent analyzes and understands the input numerical and contextual information. Then, it makes plans (e.g., call history-trimming and forecasting tools) and automatically decides which tools to use in pre/post-processing and forecasting stages. 

Features of MoiraiAgent

Superior time-series forecasting

The landscape of pretrained time-series foundation models has diversified rapidly. Models like Chronos-2 and Moirai-2 consistently top the GIFT-Eval benchmark, but they aren’t universal winners. Their strengths are domain-specific: Chronos-2 excels in Web/CloudOps and Energy, but struggles with Nature data. This “heterogeneity” stems from variations in pretraining data, making it nearly impossible for a single model to dominate every scenario.

To address this challenge, MoiraiAgent 1.0 introduces an intelligent expert-selection mechanism. Instead of betting on a single forecasting model, MoiraiAgent uses a lightweight 3B-parameter large language model (LLM) to automatically choose the best expert for each forecasting task.

The selection model takes as input a comprehensive set of features: historical numerical values, temporal characteristics (timestamps and recording frequency), predictions from candidate models, and cross-validation errors computed on a short lookback window. By reasoning over these multi-faceted signals, the selection model outputs the most suitable expert for each specific forecasting task.

Our expert pool comprises three state-of-the-art models that rank at the top of the GIFT-Eval benchmark: Chronos-v2 (Amazon), TimesFM-2.5 (Google), and Tirex (NXAI). To train the expert selector, we built a dataset of two million training examples, allowing MoiraiAgent to learn robust selection patterns across a wide variety of real-world time series. The result? MoiraiAgent doesn’t just match the best individual model—it consistently outperforms all of them. On the GIFT-Eval benchmark, MoiraiAgent achieves a MASE of 0.679, beating Chronos-v2 (0.698), TimesFM-2.5 (0.705), Tirex (0.716), and even Moirai2 (0.728). These results establish MoiraiAgent as the state-of-the-art open-source time-series forecasting model, and a powerful step toward more adaptive, context-aware forecasting in practice.

Figure 3: Performance on Forecasting Error on GIFT-Eval benchmark, smaller is better.

Context-aware time-series forecasting

While traditional forecasting models excel at extracting patterns from numerical time series, but real-world systems rarely operate in isolation. External context—such as policy changes, market events, or operational shifts—can fundamentally alter future trajectories in ways historical data alone cannot capture. MoiraiAgent addresses this gap by integrating natural language context with numerical time-series data, enabling forecasts that are both more accurate and more adaptive.

We identify three primary ways context affects forecasting:

  1. Selecting an appropriate lookback window: Context can signal a transition to a new regime, prompting MoiraiAgent to trim the historical window to focus on the most relevant recent patterns.

 Figure 4: Context indicates a phase transition in the history. MoiraiAgent selects a proper lookback-window for forecasting

  1. Refining anomaly detection: Contextual information can help MoiraiAgent identify and eliminate occasional, non-persistent patterns from historical data.

 Figure 5: Context informs about non-persistent additive trend. MoiraiAgent uses python-sandbox to remove the abnormal impact.

  1. Anticipating future effects: Context may inform about scheduled future events that will impact outcomes beyond what can be inferred from historical patterns alone. MoiraiAgent combines baseline predictions with contextual insights for adaptive forecasting.

 Figure 6: Context implies future effects. MoiraiAgent modifies the raw prediction accordingly. 

To support these diverse scenarios, MoiraiAgent employs a flexible, tool-orchestrated pipeline in which a powerful LLM acts as the central coordinator. The LLM dynamically determines which operations to perform and which tools to invoke—ranging from advanced forecasting models to a Python code sandbox—based on the demands of each task.

Some forecasting problems, however, require deeper integration of contextual factors directly into the numerical modeling process rather than simple pre- or post-processing. For example, long-term GDP forecasting depends on understanding how demographic shifts, such as changes in birth rates, reshape economic fundamentals like labor supply and consumption capacity. Addressing such challenges will require end-to-end models that embed contextual signals within the forecasting model itself—an important and promising direction for future research.

GIFT-CTX: A Benchmark for Contextual Forecasting

Despite growing interest in contextual time-series forecasting, systematic evaluation remains limited. To address this gap, we introduce GIFT-CTX, a new benchmark designed to evaluate forecasting systems that jointly reason over numerical time series and natural language context. Each sample in GIFT-CTX is carefully constructed so that accurate forecasts cannot be produced using historical values alone or context alone—both are essential.

We started with the Context-is-Key (CIK) benchmark and noticed that it has many samples that are either easily predicted from only the context, or requiring additional knowledge beyond the provided context. Furthermore, samples in CIK are not diverse in seasonalites or sequence lengths. Thus, we curated GIFT-CTX by first selecting 120 samples from CIK that are well-defined and supplementing them with 125 manually created synthetic samples. The resulting benchmark comprises 245 samples covering diverse seasonalities, sequence lengths, and contextual information scenarios. This design explicitly tests a model’s ability to integrate heterogeneous inputs and reason jointly over numerical patterns and natural language context, reflecting the complexity of real-world forecasting challenges.

Performance on Contextual Forecasting

We evaluated MoiraiAgent on GIFT-CTX against leading competitors, including both specialized foundation forecasting models and frontier generalist LLMs. As shown in Figure 7, MoiraiAgent achieves the best overall performance, outperforming all baselines. With a weighted NMAE of 0.124 and weighted NRMSE of 0.184, MoiraiAgent substantially surpasses time-series foundation models like Moirai2 (0.226/0.377) and Chronos2 (0.192/0.327), as well as generalist baselines like GPT-5.2 (0.247/0.376), Claude Opus 4.5 (0.171/0.262), and Gemini 3.0 Pro (0.146/0.223). Note that all methods used medium reasoning effort.

Figure 7: Weighted normalized mean absolute errors (NMAE) and weighted normalized root mean square errors (NRMSE) of different methods on the GIFT-CTX benchmark.

These results demonstrate the limited capabilities of existing approaches in handling time series forecasting with natural language context, especially when dealing with long sequences of numerical values. MoiraiAgent’s ability to accurately predict the future in such scenarios presents a promising solution to this challenging problem, showcasing the power of intelligent tool orchestration and multi-modal reasoning in time series forecasting.

Conclusion

MoiraiAgent represents a shift from static numerical models to an intelligent, agentic framework. By combining a SOTA expert-selection mechanism with the ability to reason over real-world context, we’ve bridged the gap between historical data and future uncertainty.

The results from the GIFT-CTX benchmark prove that the most accurate forecasts aren’t just data-driven—they are context-aware. MoiraiAgent provides the tools to move beyond numerical silos, delivering predictions that are as responsive and complex as the world they describe.

Explore More

We would like to thank Hanshu YAN, Hong-Quang Pham, Ibrahim Taha Aksu, Jun Hao Liew, Chenghao Liu, Doyen Sahoo, Junnan Li, Caiming Xiong, and Silvio Savarese for their insights and contributions to this article.

Get the latest articles in your inbox.