Ran Xu
author title Director, AI ResearchRan Xu received his Ph.D. in computer science from University at Buffalo from 2015. Currently, he leads a group of exceptional computer vision and multimodal AI researchers at Salesforce to push the boundary of research and productive AI for CRM.
Ran Xu's interest is in mutlimodal content understanding and generation, agentic systems.
In the world of AI agents that click, scroll, execute and automate — we’re moving fast from “just understand text” to “actually use software for you.” The new benchmark SCUBA tackles exactly that:…
The Challenge with Flows Today Salesforce flows sit at the heart of modern CRM automation, yet authoring them still requires a unique mix of declarative drag‑and‑drop and Apex know‑how. To ease this process,…
Architecture, Training and Dataset Github Code: https://github.com/JiuhaiChen/BLIP3o Models: https://huggingface.co/BLIP3o/BLIP3o-Model Demo: https://huggingface.co/spaces/BLIP3o/blip-3o Motivation OpenAI’s GPT-4o has demonstrated state-of-the-art performance in image understanding, generation and editing tasks. Emerging hypotheses of its architecture suggest a hybrid…
Our team at Salesforce Research introduces Text2Data, an innovative framework specifically designed to generate high-quality, controllable data from limited textual input.
To address the challenges in generating multimodal instruction data, we developed ProVision, a scalable, programmatic framework that employs scene graphs and human-written programs to systematically synthesize vision-centric instruction data.
We are excited to open-source 🍃MINT-1T, the first trillion token multimodal interleaved dataset and a valuable resource for the community to study and build large multimodal models.
HIVE is accepted to CVPR 2024. Other authors include: Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong We have seen the success of ChatGPT, which incorporates human…
UniControl is accepted to NeurIPS’23. Other authors include Yingbo Zhou, Huan Wang, Juan Carlos Niebles, Caiming Xiong, Silvio Savarese, Stefano Ermon, and Yun Fu. Is it possible for a single model to master…
Background Graphic layout designs serve as the foundation of communication between media designers and their target audience. They play a pivotal role in organizing various visual elements, including rendered text, logos, product images,…









