How does the LM implement the following tasks? (Do they work? When do they break?)
-
“spell the word ___”
-
“capitalize ___” (a, b, c, any word, phrase?)
-
(same, but give examples instead of commands.)
-
write a thorough explainer about a topic beyond the scope of the course and teach it to the class
- Examples: Jay Alammar’s blog, many articles on distill.pub, 3Blue1Brown videos,
-
create an interactive application - using a model to do something interesting (e.g., GANPaint), or allowing interesting exploration / interaction with the model itself (e.g., LSTMVis or Seq2Seq-Vis). Links to lots of examples here.
-
try out a different deep learning toolkit (e.g., TensorFlow, tensorflow.js or Flux.jl) on several tasks from class
Face Generation GAN
Teachers have a hard time getting to know students by face, especially when students are wearing masks. Flashcard apps help, but the teacher can easily “overfit” to quirks of the student photo (background, clothing, etc.).
- Input: students’ profile photos
- Output: a dozen different images for each student, with variation in background, lighting, clothing, etc. so that these factors are informative
Potential resources:
- Third Time’s the Charm? Image and Video Editing with StyleGAN3 | Papers With Code
- Near Perfect GAN Inversion | Abstract “To edit a real photo using Generative Adversarial Networks (GANs), we need a GAN inversion algorithm to identify the latent vector that perfectly reproduces it”
Code Analysis for Intro Programming Classes
AI models of (programming) code have improved markedly in recent years (see, e.g., Unified Pre-training for Program Understanding and Generation), but intro programming classes haven’t yet been able to benefit from them. Could you figure out a way to use program understanding methods to give good feedback to CS learners and their instructors? (e.g., help the instructor see patterns in students’ code)
Some code and pre-trained models you might play with:
- CodeParrot
- Facebook’s TransCoder
- Microsoft’s CodeXGLUE
- the PLBART code
Learned Multimedia Decoder
Many existing images/videos/audio are locked in poor quality low-efficiency codecs (old personal pictures, audio Bible recordings, video, music, graphics, etc.). If we could invert the poor-quality encoder, we could both recover a more faithful representation of the original and also re-encode the result in a high-efficiency codec.
- Input: a JPEG (or other legacy codec) bitstream, unpacked (e.g., the JPEG data could be arranged spatially, so the data for each macroblock would align with where it is in the image).
- Output: the correct image (or audio, video, etc.)
Deepfake Detection
Make some deepfakes. Try to detect them.
Miscellaneous ideas
-
Language
- sequence-to-sequence-to-sequence (the latent code is a sequence). Ask me for details.
-
General
- Dynamic range compression on gradient updates by changing sensitivity based on the current and recent values. Perhaps as simple as computing the weight as a nonlinear function of the stored value and perhaps a running average.