曹朴
Pu Cao (曹朴)
Email caopu@bupt.edu.cn
我于 2022 年在北京科技大学获得学士学位,目前自 2022 年起在北京邮电大学智能工程与自动化学院攻读博士学位(预计 2027 年夏天毕业)。
我的研究兴趣包括多模态理解与生成,尤其关注多模态大语言模型与扩散模型。
I received my bachelor’s degree from the University of Science and Technology Beijing (USTB), Beijing, China, in 2022, and I am currently a Ph.D. candidate at the School of Intelligent Engineering and Automation, Beijing University of Posts and Telecommunications (BUPT), since 2022 (expected to graduate in summer 2027).
My research interests include multimodal understanding and generation, especially focusing on multimodal large language models and diffusion models.
新闻 News
"Controllable Generation with Text-to-Image Diffusion Models: A Survey" is accepted by IEEE TPAMI 2025. 链接
"Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation" is accepted by AAAI 2025 (Oral). 链接
"Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation" is accepted by CVPR 2025. 链接
"E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance" is accepted by IEEE TCSVT 2025. 链接
获 ECCV 2024 杰出审稿人奖项。 链接
"What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion" is accepted by WACV 2024. 链接
"Controllable Generation with Text-to-Image Diffusion Models: A Survey" is accepted by IEEE TPAMI 2025. Link
"Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation" is accepted by AAAI 2025 (Oral). Link
"Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation" is accepted by CVPR 2025. Link
"E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance" is accepted by IEEE TCSVT 2025. Link
Received the ECCV 2024 Outstanding Reviewer Award. Link
"What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion" is accepted by WACV 2024. Link
论文 Publications
Controllable Generation with Text-to-Image Diffusion Models: A Survey
Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation
OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation
Quality Transformer for Human Parsing
Preliminary Explorations with GPT-4o(mni) Native Image Generation
Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation
E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance
Frequency-Based Matcher for Long-Tailed Semantic Segmentation
What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion
Controllable Generation with Text-to-Image Diffusion Models: A Survey
Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation
OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation
Quality Transformer for Human Parsing
Preliminary Explorations with GPT-4o(mni) Native Image Generation
Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation
E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance
Frequency-Based Matcher for Long-Tailed Semantic Segmentation
What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion
项目 Projects
Awesome Controllable T2I Diffusion Models
整理可控文生图扩散模型的资源清单,聚焦多条件控制方向,并配套综述论文。
UniDiffusion
基于 diffusers 的扩散模型训练工具箱,集成 DreamBooth、Textual Inversion、LoRA、Custom Diffusion、XTI 等方法。
GANInverter
基于 PyTorch 的 GAN 反演工具箱,提供统一流程与系统性基准评测。
Notification Skill
为智能代理任务提供完成通知,支持 Bark 推送与邮件提醒。
CodeArXiv
本地可部署的 ArXiv 论文浏览与筛选工具,卡片化界面便于追踪最新研究。
Awesome Controllable T2I Diffusion Models
A curated list of controllable text-to-image diffusion resources, emphasizing novel conditions and a linked survey.
UniDiffusion
Diffusion training toolbox built on diffusers, covering DreamBooth, Textual Inversion, LoRA, Custom Diffusion, XTI, and more.
GANInverter
PyTorch-based GAN inversion toolbox with a unified pipeline and comprehensive benchmarks.
Notification Skill
Agent completion notifications with Bark push and email support.
CodeArXiv
A locally deployable ArXiv browser for filtering and discovering papers in a card-style interface.
服务 Service
会议:ICLR 2026,CVPR 2025–2026,ICCV 2025,ECCV 2024(杰出审稿人奖),WACV 2024–2026
期刊:TPAMI,TIP,TCSVT,TMM,TNNLS
Conferences: ICLR 2026, CVPR 2025–2026, ICCV 2025, ECCV 2024 (Outstanding Reviewer Award), WACV 2024–2026
Journals: TPAMI, TIP, TCSVT, TMM, TNNLS
联系 Contact
快速给我发邮件
caopu@bupt.edu.cnSend me a quick email
caopu@bupt.edu.cn