Unit 11: Generation

We’ve seen models that classify images and text, then more recently models that can generate one single token. What if we want to generate whole articles? Or images? Music? Programs? We can adapt the same basic approaches that we used already, but with interesting twists… and, I must admit, the results are fun.

By the end of this week you should be able to:

Compare and contrast the process and results of generating sequences using three different algorithms: greedy generation, sampling, and beam search.
Explain the concept of a generator network.
Explain how a Generative Adversarial Network is trained.

Preparation

Read A basic introduction to decoding: How to generate text: using different decoding methods for language generation with Transformers
Watch Lecture 4 of MIT 6.S191 (skim Lecture 3 if needed)

We’ll also discuss GANs and Diffusion Models. I found the Foreward to this book on Deep Generative Modeling (Available through Calvin library) to be reasonably accessible, but you may prefer the author’s blog posts. (github).

Now, how do you control what gets generated? Choose your favorite modality and skim a very recent paper:

Controlling generated text
- Prefix-Tuning: Optimizing Continuous Prompts for Generation; extension: Control Prefixes for Text Generation
Captioning images
- Multimodal Few-Shot Learning with Frozen Language Models; extension: MAGMA – Multimodal Augmentation of Generative Models through Adapter-based Finetuning
Generating images
- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models | Abstract
- probably the best starting point for a project like this: Autoregressive Image Generation using Residual Quantization | Papers With Code - pretrained models in the official implementation, nice clean implementation in lucidrains/vector-quantize-pytorch: Vector Quantization, in Pytorch

Supplemental Material

ML and NLP Research Highlights of 2021
Some slides
An extensive collection of notebooks on generative models: Hitchhiker’s Guide To The Latent Space: Community Notebook Document - Google Docs
Here’s a good intro to text-guided image generation and manipulation: StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery (paper)

Class Meetings

Monday

We worked through the decoding activity (PDF) using Translation as Language Modeling notebook (preview, Colab)

Wednesday

Slides
We briefly discussed StyleGAN 3 (blog post).
- The demo was done in this notebook which also includes text-guided refinements via a method called CLIP: Connecting Text and Images (code)
- Here is the core code for StyleGAN 3

Friday: Guest Lecture

Contents

Decoding (CS 344) (Mon Jan 1)

Due this Week