Experiments with Generative Models

Saumitra Kapoor, machine learninggenerative art
Back

While browsing my computer, I came across this sweet little experiment I did a few months ago that brought back memories of dabbling with 3D modeling and generative art. So here it is, I decided to write about it here haha

WTF is VQGAN+CLIP?

VQGAN is a generative adversarial network that uses vector quantization to generate high-resolution images. The model consists of two components: a generator and a discriminator. The generator takes a random noise vector as input and produces a high-resolution image, while the discriminator takes in both real and generated images and outputs a score indicating how likely each image is to be real. During training, the generator is optimized to produce images that the discriminator cannot differentiate from real images, while the discriminator is optimized to correctly classify real and generated images.

CLIP (Contrastive Language-Image Pretraining) is a pretraining method that fine-tunes the model's image generation abilities and language understanding abilities. The model is trained on a large corpus of text and image data, allowing it to learn to associate words and phrases with specific visual features. This pretraining helps the model generate images that are more representative of the intended meaning, leading to higher quality results when used in a GAN-style architecture like VQGAN. By incorporating CLIP into the VQGAN model, the model's ability to generate images that are both high-quality and representative of a specific textual description is significantly improved.

Here's what I did-

I used the model on google colab since i only have a GTX1650 :P

As a prompt, I asked it to create colorful watercolor flower artwork and here are the results.

Output Media and Generation Timelapse

Link to generation timelapse. The output images were originally of the size 256x256 but I used AI upscaling to bring it up to 2048x2048.

© Saumitra Kapoor.