Choosing a Project

Pick a project where you can:

Structured Projects

These are well-defined starting points with external scaffolding, so you can spend your energy going deep rather than scoping.

Best Model Under Constraints

Can you train the best language model possible on our lab machines (16GB GPU)? You’d adapt the ideas from the NanoGPT speedrun and slowrun projects — which optimize for speed or data efficiency on large clusters — to a resource-constrained setting.

Some variants:

Good for demonstrating: transformer architecture understanding, training mechanics, experiment design, evaluation of generative models.

Deepening ideas: systematic ablations of architectural choices; analyze failure modes on a curated eval set; compare to a baseline API call.

Kaggle Competition

Compete in an active Kaggle competition. Choose one that uses concepts from this class (NLP, vision, sequences).

Getting a baseline working is the easy part. Here’s how to go deeper:

Open-Ended Projects

Replicate or Extend a Paper

Pick a paper with a specific quantitative result and try to get the same number — then extend it.

Recent papers that are tractable and interesting:

Ask if you want suggestions tailored to your interests.

Tips for replication projects

Before you start, verify:

  1. Is there a specific quantitative result I can use as my anchor?
  2. Can I access the same data, on hardware I have?
  3. Do I understand the basic approach well enough to implement a simple version?

The Benjamin Franklin method: Read the original, take notes in human language. Close it, try to reimplement from your notes. Fail. Open it again. Repeat.

Build Something with LLMs

Build an application, evaluate it, and analyze its failure modes.

Some ideas:

For any “build something” project: don’t just show it works — measure where it fails.

Extend Something from Class

Take any notebook from class and go deeper. Systematic extension with clear analysis is a completely valid project. Some weeks already have “Extension” suggestions.

Deepening Any Project: Lenses to Apply

These aren’t project types — they’re ways to make any project stronger.

Interpretability lens

Don’t just measure what the model outputs — probe why it does what it does.

Behavioral probing is a lightweight version: instead of looking inside the model, design inputs that reveal what the model can and can’t do. For example: can it spell a word? Say each word twice? Alliterate? These connect directly to the tokenization topic — what does the model even “see” at the character level? Related paper: Knowledge of Pretrained LMs on Surface Information of Tokens.

Evaluation lens

For any number you report, also ask: does this number actually answer the question we care about?