Junnan Li

Director, AI Research Singapore

My Background

Junnan Li is a Research Director at Salesforce. He joined Salesforce in 2019 as the founding researcher of the Singapore AI research team. In 2024, he co-founded Rhymes.ai as the chief scientist, which was soft-acquired by Salesforce AI Research in 2025. Junnan is an expert in multimodal AI, LLMs, and agentic research. His papers are well-cited and his research is widely-adopted in both industry and academia. In particular, his BLIP-series of papers are among the most top-cited AI papers with over 15k+ citations combined.

5 authors

August 22, 2025

Most agents can respond to a prompt, but ask them to click a button in your enterprise software, and suddenly its limitations show. In the age of generative AI, everyone’s racing to build…

7 authors

August 22, 2025

The landscape of AI agent development has evolved rapidly, with developers needing robust frameworks to build, test, and benchmark intelligent systems. MCP-Universe emerges as a comprehensive solution, providing a modular framework designed around…

7 authors

August 8, 2025

Time series forecasting plays a central role in data-driven decision making. Yet, adapting forecasting models across different domains and temporal resolutions often requires custom engineering. This increases both development and maintenance costs —…

6 authors

May 20, 2023

TL;DR: CodeT5+ is a new family of open code large language models (LLMs) with improved model architectures and training techniques. CodeT5+ achieves the state-of-the-art performance among the open-source LLMs on many challenging code…

3 authors

March 17, 2023

BLIP-2: Scalable Pre-training of Multimodal Foundation Models for the World's First Open-source Multimodal Chatbot

AI Research

4 authors

September 20, 2022

TL;DR: LAVIS (short for LAnguage-VISion) is an open-source deep learning library for language-vision research and applications, offering comprehensive support for a wide range of tasks, datasets, and state-of-the-art models. Featuring a unified interface…

AI Research

4 authors

May 31, 2022

TL;DR: We propose ALPRO, a new video-and-language representation learning framework which achieves state-of-the-art performance on video-text retrieval and video question answering by learning fine-grained alignment between video regions and textual entities via entity…

AI Research

3 authors

February 23, 2022

TL;DR: BLIP is a new pre-training framework for unified vision-language understanding and generation, which achieves state-of-the-art results on a wide range of vision-language tasks. Background For a review of some terms and definitions…

AI Research

Featured image for Align before Fuse (ALBEF): Advancing Vision-language Understanding with Contrastive Learning

Junnan Li

July 19, 2021

TL; DR: We propose a new vision-language representation learning framework which achieves state-of-the-art performance by first aligning the unimodal representations before fusing them. Vision and language are two of the most fundamental channels…

AI Research

Featured image for CoMatch: Advancing Semi-supervised Learning with Contrastive Graph Regularization

Junnan Li

November 23, 2020

TL; DR: We propose a new semi-supervised learning method which achieves state-of-the-art performance by learning jointly-evolved class probabilities and image representations. What are the existing semi-supervised learning methods? Semi-supervised learning aims to leverage…

AI Research

Junnan Li

Beyond the Chat Window: How Computer Use Agents Are Learning to Click, Scroll, and Work

MCP-Universe: A Comprehensive Framework for AI Agent Development and Benchmarking

Introducing Moirai 2.0

CodeT5+: Open Code Large Language Models

BLIP-2: Scalable Pre-training of Multimodal Foundation Models for the World’s First Open-source Multimodal Chatbot

Meet LAVIS: A One-stop Library for Language-Vision AI Research and Applications

ALPRO: Understanding Video and Language by Aligning Visual Regions and Text Entities

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Align before Fuse (ALBEF): Advancing Vision-language Understanding with Contrastive Learning

CoMatch: Advancing Semi-supervised Learning with Contrastive Graph Regularization

Junnan Li

Get the latest articles in your inbox.

360 Highlights

IT

Commerce

Marketing

Service

Sales

Thanks, you're subscribed!