
Junnan Li
author title Director, AI Research SingaporeJunnan Li is a Research Director at Salesforce. He joined Salesforce in 2019 as the founding researcher of the Singapore AI research team. In 2024, he co-founded Rhymes.ai as the chief scientist, which was soft-acquired by Salesforce AI Research in 2025. Junnan is an expert in multimodal AI, LLMs, and agentic research. His papers are well-cited and his research is widely-adopted in both industry and academia. In particular, his BLIP-series of papers are among the most top-cited AI papers with over 15k+ citations combined.


The landscape of AI agent development has evolved rapidly, with developers needing robust frameworks to build, test, and benchmark intelligent systems. MCP-Universe emerges as a comprehensive solution, providing a modular framework designed around…

Time series forecasting plays a central role in data-driven decision making. Yet, adapting forecasting models across different domains and temporal resolutions often requires custom engineering. This increases both development and maintenance costs —…

TL;DR: CodeT5+ is a new family of open code large language models (LLMs) with improved model architectures and training techniques. CodeT5+ achieves the state-of-the-art performance among the open-source LLMs on many challenging code…

BLIP-2: Scalable Pre-training of Multimodal Foundation Models for the World's First Open-source Multimodal Chatbot

TL;DR: LAVIS (short for LAnguage-VISion) is an open-source deep learning library for language-vision research and applications, offering comprehensive support for a wide range of tasks, datasets, and state-of-the-art models. Featuring a unified interface…

TL;DR: We propose ALPRO, a new video-and-language representation learning framework which achieves state-of-the-art performance on video-text retrieval and video question answering by learning fine-grained alignment between video regions and textual entities via entity…

TL;DR: BLIP is a new pre-training framework for unified vision-language understanding and generation, which achieves state-of-the-art results on a wide range of vision-language tasks. Background For a review of some terms and definitions…
TL; DR: We propose a new vision-language representation learning framework which achieves state-of-the-art performance by first aligning the unimodal representations before fusing them. Vision and language are two of the most fundamental channels…
TL; DR: We propose a new semi-supervised learning method which achieves state-of-the-art performance by learning jointly-evolved class probabilities and image representations. What are the existing semi-supervised learning methods? Semi-supervised learning aims to leverage…