376 Unit 1: Generative Modeling Introduction

Welcome to CS 376

Objectives

Understand how modern generative models work (for chatbots, image generation, etc.)
Learn to use them skillfully and wisely, as users and developers

We’ll view these systems through the same lens we started developing in CS 375: as tuneable machines that learn to play optimization games. But now we’re scaling up and out.

Your Objectives

Discuss with your tables:

What do you hope that a technical understanding of generative AI will let you do? What difference do you want to make in the world as a result of taking this course?

Key Questions

Tuneable Machines:

How can we represent text, images, and other data as sequences?
How can we process and generate sequences using neural nets?
How can models capture and use nuanced long-range relationships?

ML Systems:

How do we evaluate language models?
Can I run an LLM on my laptop? Can I train one?
How do I get good-quality results from an LLM?
How can I use an LLM to make a (semi-)autonomous agent?

Key Questions (continued)

Learning Machines

How can we learn without labeled data? (self-supervised learning)
How do foundation models learn generalizable patterns from massive datasets?
How can generative agents learn to improve their behavior from feedback?
Some current models can learn at test time (e.g., in-context learning); how does this work?

Context and Implications

What are the limits of AI systems? Is superhuman AI imminent?
What might happen socially when AI systems are deployed broadly? (effects on work, education, creativity, …)
How might we design AI systems to align with human values? to honor each other and our neighbors? What are the risks if we don’t?
How do privacy and copyright relate with AI? Is generative AI all theft?
What is creativity? Agency? Truth?

Ways the logistics will be different from CS 375

We’ll have a final project
Meeting objectives will be more incremental and structured, with deadlines for progress

Projects

Project showcase instead of final exam
Can be in teams (if each member has a clear role and contribution)
Should demonstrate:
- understanding of how something in ML works
- implementation and experimentation skills
- communication skills

Some ideas are up on the course website.

This Week’s Readings

On Perusall (graded by participation: watch/read it all, write a few good comments)

A nice intro video from 3blue1brown (you may have watched this already)
Some intro to Transformers NLP
A Communications of the ACM article with some historical context

Wednesday

Prayer for Calvin and AI

On this Calvin Day of Prayer, gather with one or two others and spend a few minutes asking God for…

pray for…

wisdom
boundaries
perseverance
community
discernment
humility
gratitude

for those who…

study
teach and give feedback
advise students
work in IT
think through how Calvin institutionally responds

Scripture

The one who has knowledge uses words with restraint,
    and whoever has understanding is even-tempered.
Even fools are thought wise if they keep silent,
    and discerning if they hold their tongues.

Proverbs 17:27-28

Logistics

Readings: How is Perusall going?
Moodle participation activity
Preview Discussion 1

Language Modeling

Tell me a joke.

Q: What don’t scientists trust atoms?
A: They ___

What is the first word that goes in the blank?

Another example

neighs and rhymes with course

Today’s Activity

A language model produces text one token at a time, predicting a probability distribution over what comes next.

Today you’ll see exactly what that looks like — and what it tells us about how these models work.

Open the LM Internals tool and grab the handout.

Debrief: What Did You See?

Naming What You Found

In the activity, you saw:

The model predicts a distribution over the next token at every position
Some positions were tightly constrained (high confidence)
Others were wide open (many plausible options)
Changing the context changed the distribution

Let’s now put formal names on these observations.

Causal Language Modeling

Write the joint probability as a product of conditional probabilities:

P(tell, me, a, joke) = P(tell) * P(me | tell) * P(a | tell, me) * P(joke | tell, me, a)

A causal language model gives P(word | prior words)

Don’t get to look at “future” words or backtrack
Analogy: someone constantly trying to finish your sentence
Intuitively: a classifier that predicts the next word in a sentence

Causal Language Modeling as Classification

Task: given what came so far, predict the next thing

Next character: # possibilities (classes) = ______
Next word: # possibilities = _____
What else could we use as “next thing”?

How that classifier works

P(word | context) = softmax(wordLogits(word, context))

wordLogits(word, context) = dot(vec(word), vec(context))
vec(word): look up in a (learnable) table: embedding
vec(context) : computed by a neural network

Do you recognize this structure?

Applying the concept: “Retry”

What happens when you press “Retry” on ChatGPT?
It takes another sample from the conditional distribution of response given prompt

Talk to your neighbors:

Will you ever get the exact same response twice?
Will the different samples have different likelihoods?
What would the likelihood be like if you always picked the most likely token? (greedy generation)

Friday

Scripture

What shall I return to the Lord
    for all his goodness to me?
I will lift up the cup of salvation
    and call on the name of the Lord.

Psalm 116:12-13

Logistics

Discussion 1
Readings: reminder about Perusall

Projects

Projects Choice Page is up

Three Approaches to Generative Modeling

How can we generate complex data (text, images, audio)?

Approach	Core idea	Example
Autoregressive	One piece at a time, left to right	ChatGPT, Claude
Latent variable	Sample a code, decode it	StyleGAN interpolation
Diffusion	Start from noise, iteratively denoise	Diffusion Explainer

We’ll focus on autoregressive models — they power most modern LLMs.

See the notes page for details on all three.

Text to Numbers (and back)

Neural nets work with numbers. How do we convert text to numbers that we can feed into our models?
Neural nets give us numbers as output. How do we go back from numbers into text?

Tokenization

Two parts:

splitting strings into tokens
- sometimes just called tokenization
- may or may not be reversible, e.g., strips special characters
converting tokens into numbers
- vocabulary: a list (gives each token a number)
- size and contents of vocabulary don’t change

An example: https://platform.openai.com/tokenizer

Tokenization Examples

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilgpt2", add_prefix_space=True)

tokens = tokenizer.tokenize("Hello, world!")
tokens

['ĠHello', ',', 'Ġworld', '!']

(The “Ġ” is an internal detail to GPT-2; ignore it for now.)

token_ids = tokenizer.convert_tokens_to_ids(tokens)
token_ids

[18435, 11, 995, 0]

tokenizer.decode(token_ids)

' Hello, world!'

LLM APIs: Using sequence models

response = client.chat.completions.create(
    model="gpt-5.2",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Classify this review as positive or negative: <review>..."}
    ]
)
result = response.choices[0].message.content

Key abstraction: the conversation — a structured “document” with system instructions, user messages, assistant responses, “tool” calls/responses, reasoning traces
API: you ask for the next message given the conversation so far (no training)
Stateless: Each conversation is independent — the model itself doesn’t remember past conversations (but system can prepend them to future conversations)
Agent extension: the model can output requests to run code (e.g., search, calculate, edit file); the system runs the code and includes the output in the conversation
When appropriate: text tasks, prototyping, when training data is scarce
When not: latency-critical, cost-sensitive, tasks requiring precise numeric output

Ways to Run an LLM

Approach	Cost	Models
Commercial API (OpenAI, Google, Anthropic)	Pay per token	Largest, most capable
Free tier (Google Gemini)	Free (rate-limited)	Mid-size
Run locally (Ollama)	Free (your hardware)	Smaller models

API key = how the provider identifies you. Don’t share it or commit it to git.

For Exercise 376.1: free Google Gemini API key or Ollama.

Conversation State: It’s Just a List

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"},
]
response = client.chat.completions.create(model=model, messages=messages)
assistant_msg = response.choices[0].message.content

# To continue: append the assistant's reply, then your next message
messages.append({"role": "assistant", "content": assistant_msg})
messages.append({"role": "user", "content": "Are you sure?"})
response2 = client.chat.completions.create(model=model, messages=messages)

The API is stateless — each call sends the entire conversation
This is the “chat as document” idea: your code maintains the document

Acknowledgments

Some figures from Prince, Understanding Deep Learning, 2023