I am currently a fourth(final)-year Ph.D. Student in the School of Computing, National University of Singapore, advised by Prof. Yang You. Before that, I obtained my masterโs and bachelorโs degrees from Northwestern Polytechnical University, China, in 2019 and 2022, respectively. During my masterโs study, I was fortunatedly to collaborate with Dr. Nian Liu, under the supervision from Prof. Junwei Han.
My research interest includes efficient deep learning, dynamic neural network, and mulit-modal model. I have published more than 10 papers at the top international AI conferences and journals with .
All talents are welcome to send an email (wangbo.zhao96@gmail.com) to me if you are interested in collaborating on projects related to efficient deep learning or other promising research directions.
Apart from research, I am an amateur track and field athlete, specializing in the 400 meters (PB 53.40) and 400-meter hurdles (PB 1:01.78).
๐ฅ News
- 2025.09: ย ๐๐ Recipient of the Google PhD Fellowship 2025 in Machine Learning and ML Foundations.
- 2025.07: ย ๐๐ One paper accepted to ICCV 2025.
- 2025.06: ย ๐๐ I begin my Internship at Meta in Zurich.
- 2025.05: ย ๐๐ One paper accepted to ICML 2025.
- 2025.02: ย ๐๐ One paper accepted to CVPR 2025.
- 2025.01: ย ๐๐ One paper accepted to ICLR 2025.
- 2024.09: ย ๐๐ One paper accepted to NeurIPS 2024.
- 2024.07: ย ๐๐ One paper accepted to ECCV 2024.
๐ Publications

EA-ViT: Efficient Adaptation for Elastic Vision Transformer
Chen Zhu, Wangbo Zhaoโ , Huiwen Zhang, Samir Khaki, Yuhao Zhou, Weidong Tang, Shuo Wang, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Kai Wang, Dawei Yangโ
- EA-ViT is an efficient adaptation framework for Vision Transformers, enabling a single process to generate flexible models of varying sizes for diverse resource constraints, using a nested elastic architecture and a lightweight router optimized with Pareto-optimal configurations.

Unsupervised Learning for Class Distribution Mismatch
Pan Du, Wangbo Zhaoโ , Xinai Lu, Nian Liu, Zhikai Li, Chaoyu Gong, Suyun Zhaoโ , Hong Chen, Cuiping Li, Kai Wang, Yang You
- UCDM addresses Class Distribution Mismatch (CDM) by leveraging unlabeled data to train classifiers through positive-negative pairs, synthesized using a diffusion model, and a confidence-based pseudo-labeling mechanism, achieving superior performance over semi-supervised methods without relying on labeled data.

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Wangbo Zhao, Yizeng Han, Jiasheng Tang, Zhikai Li, Yibing Song, Kai Wang, Zhangyang Wang, Yang You
- We employ the attention map aggregated from a small VLM to guide visual token pruning in a large VLM. And an early exiting mechanism is developed to fully use the small VLMโs predictions, dynamically invoking the larger VLM only when necessary, yielding a superior trade-off between accuracy and computation.

Wangbo Zhao, Yizeng Han, Jiasheng Tang, Kai Wang, Yibing Song, Gao Huang, Fan Wang, Yang You
- We propose to dynamically adjust the computation of DiT in different timesteps and spatial locations of images. The computation of DiT-XL could be saved by 50% without sacrificing generation quality.

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation
Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You
- We propose to adapt static ViT to dynamic ViT via parameter-efficient fine-tuning without full-parameter tuning.

Mmbench: Is your multi-modal model an all-around player?
Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin
- We propose MMBench, a bilingual benchmark for assessing the multi-modal capabilities of VLMs.

VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
Ziyang Luo, Nian Liu, Wangbo Zhao, Xuguang Yang, Dingwen Zhang, Deng-Ping Fan, Fahad Khan, Junwei Han
- We introduce VSCode a generalist model with novel 2D prompt learning to jointly address four SOD tasks and three COD tasks

Multi-grained temporal prototype learning for few-shot video object segmentation
Nian Liu, Kepan Nan, Wangbo Zhao, Yuanwei Liu, Xiwen Yao, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Junwei Han, Fahad Shahbaz Khan
- We propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data for few-shot video object segmentation

Modeling motion with multi-modal features for text-based video segmentation
Wangbo Zhao, Kai Wang, Xiangxiang Chu, Fuzhao Xue, Xinchao Wang, Yang You
- We design a method to fuse and align appearance, motion, and linguistic features to achieve accurate text-based video segmentation.

Light field saliency detection with dual local graph learning and reciprocative guidance
Nian Liu, Wangbo Zhao, Dingwen Zhang, Junwei Han, Ling Shao
- We introduce a reciprocative guidance scheme for light field saliency detection.

Weakly supervised video salient object detection
Wangbo Zhao, Jing Zhang, Long Li, Nick Barnes, Nian Liu, Junwei Han
- We present the first weakly supervised video salient object detection model based on relabeled fixation guided scribble annotations.
๐ Educations
- 2022.08 - 2026.06, Ph.D., School of Computing, National University of Singapore, Singapore.
- 2019.09 - 2022.04, Master, School of Automation, Northwestern Polytechnical University, China
- 2017.07 - 2019.01, Undergraduate, Universitรฉ de technologie de Troyes, France
- 2015.09 - 2019.06, Undergraduate, Honors College, Northwestern Polytechnical University, China