Pu Cao

Pu Cao

Ph.D. student of Artificial Intelligence

Beijing University of Posts and Telecommunications

Biography

Pu Cao is a second-year Ph.D. student studying at Beijing University of Posts and Telecommunications (BUPT) under the supervision of Prof. Qing Song and Dr. Lu Yang. He is now interested in Computer Vision and am currently working on Image Synthesis.

Interests
  • Image Synthesis
  • Multimodal Large Language Models
  • Visual Representation
  • Image Detection/Segmentation
  • Computer Vision
Education
  • PhD in Artificial Intelligence, 2022

    Beijing University of Posts and Telecommunications

  • BSc in Information and Computational Science, 2018

    University of Science and Technology Beijing

Publications

*
Controllable Generation with Text-to-Image Diffusion Models: A Survey
arXiv 2024.
A survey on controllable generation with text-to-image diffusion models.
Controllable Generation with Text-to-Image Diffusion Models: A Survey
E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance
arXiv 2024.
Improving editability in text-guided Image editing.
E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance
Concept-centric Personalization with Large-scale Diffusion Priors
ArXiv 2023.
Customize diffusion model for concept-centric generation with high controllability, fidelity, and diversity.
Concept-centric Personalization with Large-scale Diffusion Priors
What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion
WACV 2024.
Editing capability decreases ineivitably in previous refinement methods, (e.g., PTI, HFGI, and SAM). In this work, we explore the idea of “divide and conquer” to address this problem. We combine two mainstream refinement mechanisms (i.e., weight ande feature modulation) and achieve extroadinary inversion and editing results.
What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion
LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space
arXiv 2022.
We analyse the resources of “Fidelity, Perception, and Editability” in inversion task and point out that the keypoint is disalignment between inverse latent codes and synthetic distribution. We then propose a simple but efficient and uniform solution in both optimization-based and encoder-based methods.
LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space

Projects

UniDiffusion
A Diffusion training toolbox based on diffusers and existing SOTA methods, including Dreambooth, Texual Inversion, LoRA, Custom Diffusion, XTI, ….
UniDiffusion
Awesome Controllable T2I Diffusion Models
A collection of resources on controllable generation with text-to-image diffusion models.
Awesome Controllable T2I Diffusion Models
GAN Inverter
A GAN inversion toolbox based on PyTorch library. We design a unified pipeline for inversion methods and conduct a comprehensive benchmark.
GAN Inverter

Service

  • TPAMI
  • TMM
  • TNNLS
  • TCSVT
  • WACV (2024, 2025)
  • ECCV (2024)

Contact