Zusammenfassung
We show that diffusion models can achieve image sample quality superior to
the current state-of-the-art generative models. We achieve this on
unconditional image synthesis by finding a better architecture through a series
of ablations. For conditional image synthesis, we further improve sample
quality with classifier guidance: a simple, compute-efficient method for
trading off diversity for fidelity using gradients from a classifier. We
achieve an FID of 2.97 on ImageNet 128$\times$128, 4.59 on ImageNet
256$\times$256, and 7.72 on ImageNet 512$\times$512, and we match BigGAN-deep
even with as few as 25 forward passes per sample, all while maintaining better
coverage of the distribution. Finally, we find that classifier guidance
combines well with upsampling diffusion models, further improving FID to 3.94
on ImageNet 256$\times$256 and 3.85 on ImageNet 512$\times$512. We release our
code at https://github.com/openai/guided-diffusion
Nutzer