曹朴

Pu Cao (曹朴)

我于 2022 年在北京科技大学获得学士学位，目前自 2022 年起在北京邮电大学智能工程与自动化学院攻读博士学位（预计 2027 年夏天毕业）。

我的研究兴趣包括多模态理解与生成，尤其关注多模态大语言模型与扩散模型。

I received my bachelor’s degree from the University of Science and Technology Beijing (USTB), Beijing, China, in 2022, and I am currently a Ph.D. candidate at the School of Intelligent Engineering and Automation, Beijing University of Posts and Telecommunications (BUPT), since 2022 (expected to graduate in summer 2027).

My research interests include multimodal understanding and generation, especially focusing on multimodal large language models and diffusion models.

Google Scholar Google Scholar CV CV GitHub GitHub Twitter Twitter

经历 Experiences

2022.09–2027.06

北京邮电大学 博士

智能工程与自动化学院 · 控制科学与工程专业

2018.09–2022.06

北京科技大学 本科

数理学院 · 信息与计算科学专业

2022.09–2027.06

Beijing University of Posts and Telecommunications Ph.D.

School of Intelligent Engineering & Automation · Control Science and Engineering

2018.09–2022.06

University of Science and Technology Beijing B.S.

School of Mathematics and Physics · Information and Computing Science

新闻 News

2025.12

"Controllable Generation with Text-to-Image Diffusion Models: A Survey" is accepted by IEEE TPAMI 2025. 链接

2025.11

"Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation" is accepted by AAAI 2025 (Oral). 链接

2025.02

"Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation" is accepted by CVPR 2025. 链接

2025.02

"E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance" is accepted by IEEE TCSVT 2025. 链接

2024.11

获 ECCV 2024 杰出审稿人奖项。链接

2024.01

"What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion" is accepted by WACV 2024. 链接

2025.12

"Controllable Generation with Text-to-Image Diffusion Models: A Survey" is accepted by IEEE TPAMI 2025. Link

2025.11

"Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation" is accepted by AAAI 2025 (Oral). Link

2025.02

"Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation" is accepted by CVPR 2025. Link

2025.02

"E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance" is accepted by IEEE TCSVT 2025. Link

2024.11

Received the ECCV 2024 Outstanding Reviewer Award. Link

2024.01

"What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion" is accepted by WACV 2024. Link

论文 Publications

2025.12

IEEE TPAMI 2025

Controllable Generation with Text-to-Image Diffusion Models: A Survey

曹朴, 周峰, 宋晴, 杨录✉

PDF Code

2025.11

AAAI 2025 (Oral)

Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation

周峰*, 曹朴*, 马熠阳, 杨录, 尹建芹✉

PDF

2025.07

IEEE TCSVT 2025

OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation

王立直*, 周峰*, 于博, 曹朴, 尹建芹✉

PDF Code

2025.05

Pattern Recognition 2026

Quality Transformer for Human Parsing

郭尧, 杨录, 曹朴, 李珊, 周怡琳, 宋晴✉

PDF

2025.05

arXiv:2505.05501

Preliminary Explorations with GPT-4o(mni) Native Image Generation

曹朴†, 周峰*, 吉峻毅*, 孔庆烨*, 吕志翔*, 张明健*, 赵雪坤*, 吴思琪, 林英慧, 宋晴, 杨录†✉

PDF

2025.02

CVPR 2025

Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation

曹朴*, 周峰*, 杨录✉, 黄天瑞, 宋晴

PDF Code

2025.02

IEEE TCSVT 2025

E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance

黄天瑞*, 曹朴*, 杨录, 刘春, 胡梦婕, 刘智威, 宋晴✉

PDF

2024.05

IEEE TMM 2024

Frequency-Based Matcher for Long-Tailed Semantic Segmentation

李珊, 杨录, 曹朴, 李刘磊, 马华东

PDF

2023.11

WACV 2024

What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion

曹朴, 杨录, 刘冬旭, 杨晓雅, 黄天瑞, 宋晴✉

PDF Code

2025.12

IEEE TPAMI 2025

Controllable Generation with Text-to-Image Diffusion Models: A Survey

Pu Cao, Feng Zhou, Qing Song, Lu Yang✉

PDF Code

2025.11

AAAI 2025 (Oral)

Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation

Feng Zhou*, Pu Cao*, Yiyang Ma, Lu Yang, Jianqin Yin✉

PDF

2025.07

IEEE TCSVT 2025

OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation

Lizhi Wang*, Feng Zhou*, Bo Yu, Pu Cao, Jianqin Yin✉

PDF Code

2025.05

Pattern Recognition 2026

Quality Transformer for Human Parsing

Yao Guo, Lu Yang, Pu Cao, Shan Li, Yilin Zhou, Qing Song✉

PDF

2025.05

arXiv:2505.05501

Preliminary Explorations with GPT-4o(mni) Native Image Generation

Pu Cao†, Feng Zhou*, Junyi Ji*, Qingye Kong*, Zhixiang Lv*, Mingjian Zhang*, Xuekun Zhao*, Siqi Wu, Yinghui Lin, Qing Song, Lu Yang†✉

PDF

2025.02

CVPR 2025

Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation

Pu Cao*, Feng Zhou*, Lu Yang✉, Tianrui Huang, Qing Song

PDF Code

2025.02

IEEE TCSVT 2025

E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance

Tianrui Huang*, Pu Cao*, Lu Yang, Chun Liu, Mengjie Hu, Zhiwei Liu, Qing Song✉

PDF

2024.05

IEEE TMM 2024

Frequency-Based Matcher for Long-Tailed Semantic Segmentation

Shan Li, Lu Yang, Pu Cao, Liulei Li, Huadong Ma

PDF

2023.11

WACV 2024

What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion

Pu Cao, Lu Yang, Dongxv Liu, Xiaoya Yang, Tianrui Huang, Qing Song✉

PDF Code

项目 Projects

项目

Awesome Controllable T2I Diffusion Models

整理可控文生图扩散模型的资源清单，聚焦多条件控制方向，并配套综述论文。

项目

UniDiffusion

基于 diffusers 的扩散模型训练工具箱，集成 DreamBooth、Textual Inversion、LoRA、Custom Diffusion、XTI 等方法。

项目

GANInverter

基于 PyTorch 的 GAN 反演工具箱，提供统一流程与系统性基准评测。

项目

Notification Skill

为智能代理任务提供完成通知，支持 Bark 推送与邮件提醒。

项目

CodeArXiv

本地可部署的 ArXiv 论文浏览与筛选工具，卡片化界面便于追踪最新研究。

Project

Awesome Controllable T2I Diffusion Models

A curated list of controllable text-to-image diffusion resources, emphasizing novel conditions and a linked survey.

Project

UniDiffusion

Diffusion training toolbox built on diffusers, covering DreamBooth, Textual Inversion, LoRA, Custom Diffusion, XTI, and more.

Project

GANInverter

PyTorch-based GAN inversion toolbox with a unified pipeline and comprehensive benchmarks.

Project

Notification Skill

Agent completion notifications with Bark push and email support.

Project

CodeArXiv

A locally deployable ArXiv browser for filtering and discovering papers in a card-style interface.

服务 Service

审稿服务

会议：ICLR 2026，CVPR 2025–2026，ICCV 2025，ECCV 2024（杰出审稿人奖），WACV 2024–2026

期刊：TPAMI，TIP，TCSVT，TMM，TNNLS

Reviewer Service

Conferences: ICLR 2026, CVPR 2025–2026, ICCV 2025, ECCV 2024 (Outstanding Reviewer Award), WACV 2024–2026

Journals: TPAMI, TIP, TCSVT, TMM, TNNLS

联系 Contact

快速给我发邮件

caopu@bupt.edu.cn

Send me a quick email

caopu@bupt.edu.cn