Large-scale GAN for text-to-image synthesis

About GigaGAN

GigaGAN is a novel architecture that far exceeds the previous limits of GAN producing ultra HD images.

With 1 billion parameters, GigaGAN is achieving lower FID than Stable Diffusion v1.5, DALL·E 2, and Parti-750M. It generates 512px outputs at 0.13s, orders of magnitude faster than diffusion and autoregressive models, and inherits the disentangled, continuous, and controllable latent space of GANs. We also train a fast upsampler that can generate 4K images from the low-res outputs of text-to-image models.


  • ✅ Authors: POSTECH + CMU + Adobe ?
  • ✅ GAN-based billion-scale model on billions pictures
  • ✅ 36× larger than StyleGAN, 6× than StyleGAN-XL
  • ✅ Text-conditioned GAN-upsampling >> DALLE
  • ✅ Ultra HD images at 4k resolution in 3.66 secs

Read in Ukrainian or Ru