376 Unit 1: Generative Modeling Introduction

Welcome to CS 376

Objectives

  • Understand how modern generative models work (for chatbots, image generation, etc.)
  • Learn to use them skillfully and wisely, as users and developers

We’ll view these systems through the same lens we started developing in CS 375: as tuneable machines that learn to play optimization games. But now we’re scaling up and out.

Your Objectives

Discuss with your tables:

What do you hope that a technical understanding of generative AI will let you do? What difference do you want to make in the world as a result of taking this course?

Key Questions

Tuneable Machines:

  • How can we represent text, images, and other data as sequences?
  • How can we process and generate sequences using neural nets?
  • How can models capture and use nuanced long-range relationships?

ML Systems:

  • How do we evaluate language models?
  • Can I run an LLM on my laptop? Can I train one?
  • How do I get good-quality results from an LLM?
  • How can I use an LLM to make a (semi-)autonomous agent?

Key Questions (continued)

Learning Machines

  • How can we learn without labeled data? (self-supervised learning)
  • How do foundation models learn generalizable patterns from massive datasets?
  • How can generative agents learn to improve their behavior from feedback?
  • Some current models can learn at test time (e.g., in-context learning); how does this work?

Context and Implications

  • What are the limits of AI systems? Is superhuman AI imminent?
  • What might happen socially when AI systems are deployed broadly? (effects on work, education, creativity, …)
  • How might we design AI systems to align with human values? to honor each other and our neighbors? What are the risks if we don’t?
  • How do privacy and copyright relate with AI? Is generative AI all theft?
  • What is creativity? Agency? Truth?

Ways the logistics will be different from CS 375

  • We’ll have a final project
  • Meeting objectives will be more incremental and structured, with deadlines for progress

Projects

  • Project showcase instead of final exam
  • Can be in teams (if each member has a clear role and contribution)
  • Should demonstrate:
    • understanding of how something in ML works
    • implementation and experimentation skills
    • communication skills

Some ideas are up on the course website.

This Week’s Readings

On Perusall (graded by participation: watch/read it all, write a few good comments)

  • A nice intro video from 3blue1brown (you may have watched this already)
  • Some intro to Transformers NLP
  • A Communications of the ACM article with some historical context

Wednesday

Prayer for Calvin and AI

On this Calvin Day of Prayer, gather with one or two others and spend a few minutes asking God for…

pray for

  • wisdom
  • boundaries
  • perseverance
  • community
  • discernment
  • humility
  • gratitude

for those who

  • study
  • teach and give feedback
  • advise students
  • work in IT
  • think through how Calvin institutionally responds

Scripture

The one who has knowledge uses words with restraint,
    and whoever has understanding is even-tempered.
Even fools are thought wise if they keep silent,
    and discerning if they hold their tongues.

Proverbs 17:27-28

Logistics

  • Readings: How is Perusall going?
  • Moodle participation activity
  • Preview Discussion 1

Language Modeling

Tell me a joke.

Q: What don’t scientists trust atoms?
A: They ___

What is the first word that goes in the blank?

Another example

neighs and rhymes with course

Today’s Activity

A language model produces text one token at a time, predicting a probability distribution over what comes next.

Today you’ll see exactly what that looks like — and what it tells us about how these models work.

Open the LM Internals tool and grab the handout.

Debrief: What Did You See?

Naming What You Found

In the activity, you saw:

  • The model predicts a distribution over the next token at every position
  • Some positions were tightly constrained (high confidence)
  • Others were wide open (many plausible options)
  • Changing the context changed the distribution

Let’s now put formal names on these observations.

Causal Language Modeling

Write the joint probability as a product of conditional probabilities:

P(tell, me, a, joke) = P(tell) * P(me | tell) * P(a | tell, me) * P(joke | tell, me, a)

A causal language model gives P(word | prior words)

  • Don’t get to look at “future” words or backtrack
  • Analogy: someone constantly trying to finish your sentence
  • Intuitively: a classifier that predicts the next word in a sentence

Causal Language Modeling as Classification

Task: given what came so far, predict the next thing

  • Next character: # possibilities (classes) = ______
  • Next word: # possibilities = _____
  • What else could we use as “next thing”?

How that classifier works

P(word | context) = softmax(wordLogits(word, context))

  • wordLogits(word, context) = dot(vec(word), vec(context))
  • vec(word): look up in a (learnable) table: embedding
  • vec(context) : computed by a neural network

Do you recognize this structure?

Applying the concept: “Retry”

  • What happens when you press “Retry” on ChatGPT?
  • It takes another sample from the conditional distribution of response given prompt

Talk to your neighbors:

  • Will you ever get the exact same response twice?
  • Will the different samples have different likelihoods?
  • What would the likelihood be like if you always picked the most likely token? (greedy generation)

Friday

Scripture

What shall I return to the Lord
    for all his goodness to me?
I will lift up the cup of salvation
    and call on the name of the Lord.

Psalm 116:12-13

See also James 1:16-18

Logistics

  • Discussion 1
  • Readings: reminder about Perusall

Projects

  • Projects Choice Page is up

Three Approaches to Generative Modeling

How can we generate complex data (text, images, audio)?

Approach Core idea Example
Autoregressive One piece at a time, left to right ChatGPT, Claude
Latent variable Sample a code, decode it StyleGAN interpolation
Diffusion Start from noise, iteratively denoise Diffusion Explainer

We’ll focus on autoregressive models — they power most modern LLMs.

See the notes page for details on all three.

Text to Numbers (and back)

  • Neural nets work with numbers. How do we convert text to numbers that we can feed into our models?

  • Neural nets give us numbers as output. How do we go back from numbers into text?

Tokenization

Two parts:

  • splitting strings into tokens
    • sometimes just called tokenization
    • may or may not be reversible, e.g., strips special characters
  • converting tokens into numbers
    • vocabulary: a list (gives each token a number)
    • size and contents of vocabulary don’t change

An example: https://platform.openai.com/tokenizer

Tokenization Examples

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilgpt2", add_prefix_space=True)
tokens = tokenizer.tokenize("Hello, world!")
tokens
['ĠHello', ',', 'Ġworld', '!']

(The “Ġ” is an internal detail to GPT-2; ignore it for now.)

token_ids = tokenizer.convert_tokens_to_ids(tokens)
token_ids
[18435, 11, 995, 0]
tokenizer.decode(token_ids)
' Hello, world!'

LLM APIs: Using sequence models

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Classify this review as positive or negative: <review>..."}
    ]
)
result = response.choices[0].message.content
  • Key abstraction: the conversation — a structured “document” with system instructions, user messages, assistant responses, “tool” calls/responses, reasoning traces

  • API: you ask for the next message given the conversation so far (no training)

  • Stateless: Each conversation is independent — the model itself doesn’t remember past conversations (but system can prepend them to future conversations)

  • Agent extension: the model can output requests to run code (e.g., search, calculate, edit file); the system runs the code and includes the output in the conversation

  • When appropriate: text tasks, prototyping, when training data is scarce

  • When not: latency-critical, cost-sensitive, tasks requiring precise numeric output

Ways to Run an LLM

Approach Cost Models
Commercial API (OpenAI, Google, Anthropic) Pay per token Largest, most capable
Free tier (Google Gemini) Free (rate-limited) Mid-size
Run locally (Ollama) Free (your hardware) Smaller models

API key = how the provider identifies you. Don’t share it or commit it to git.

For Exercise 376.1: free Google Gemini API key or Ollama.

Conversation State: It’s Just a List

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"},
]
response = client.chat.completions.create(model=model, messages=messages)
assistant_msg = response.choices[0].message.content

# To continue: append the assistant's reply, then your next message
messages.append({"role": "assistant", "content": assistant_msg})
messages.append({"role": "user", "content": "Are you sure?"})
response2 = client.chat.completions.create(model=model, messages=messages)
  • The API is stateless — each call sends the entire conversation
  • This is the “chat as document” idea: your code maintains the document

Acknowledgments

Some figures from Prince, Understanding Deep Learning, 2023