376 Unit 5: Multimodal Models and Diffusion

You’ve spent several weeks learning how generative AI systems actually work — tokenization, attention, training pipelines, tool use, failure modes. You’re now much more qualified than average to answer the questions people will ask you: Where is AI going? Is it good or bad?

But even experts disagree. Some are impressed by what AI can do (“fans”). Others doubt the claims (“skeptics”). Some are optimistic about societal benefits (“optimists”). Others worry about serious harm (“concerned”). Many thoughtful people hold several of these views at once. To be wise, we need to engage honestly with perspectives we don’t naturally hold.

This Discussion addresses the course objectives Overall-PhilNarrative and Overall-Impact.

We’ll share our findings in class on the last day and compare with the results of a national survey.

Instructions

Step 1: Take the survey. The Moodle forum includes a link to a brief survey about your current views on AI. Fill it out first.

Step 2: Find two articles that represent genuinely different perspectives on the future of AI. Your two articles should pull in different directions — not two versions of the same take. Read with hospitality: you’ll need to articulate the other side’s view convincingly.

For each article:

Provide a well-formatted link (source should be clear without clicking)
Label the stance in a keyword or two: e.g., [skeptical, optimistic] or [fan, concerned]
Summarize it in a few sentences — written to convince someone who disagrees to actually read it

Step 3: Articulate your own position (~150-250 words), drawing substantively on both articles. Where do you land, and why?

Ground your position in something beyond personal preference. You might draw on:

What you’ve learned in this course about how these systems actually work (and fail). You’ve seen sycophancy, training data issues, agent failure modes — use that knowledge.
A philosophical framework — e.g., what does it mean for a system to “understand”? What assumptions are embedded in how we talk about AI? (This connects to Overall-PhilNarrative.)
A theological framework — e.g., creation (human work as unfolding latent possibilities, the image of God), the fall (technology shaped by broken relationships, idolatry of efficiency or control), redemption and shalom (right relationships, flourishing, justice).
An ethical framework — distributive justice, professional responsibility, the precautionary principle, care ethics.
Historical precedent — how have previous technologies (printing press, electricity, the internet) reshaped society, and what can we learn?

The best posts will show that you’ve genuinely wrestled with a view you don’t naturally hold.

Finding Sources

The landscape changes fast, so find your own sources rather than relying on a list. Some places to look:

Where to search:

Major news outlets’ AI coverage (Calvin has a New York Times site license)
Substacks and blogs from researchers, journalists, and critics
Research institutions: AI Now Institute, Berkman Klein Center, Center for Humane Technology
Aggregators: PapersWithCode, arXiv preprints
Industry perspectives: company blogs, announcements, economic analyses

Kinds of voices to look for:

Researchers building these systems and explaining what excites them
Researchers studying these systems and explaining what worries them
Economists analyzing labor market effects
Legal scholars on copyright, liability, regulation
Philosophers on intelligence, consciousness, agency
Journalists investigating real-world impacts on specific communities
Skeptics who think current AI is overhyped (Gary Marcus, AI Snake Oil, Emily Bender, Melanie Mitchell)
Optimists who think AI will transform society for the better (Dario Amodei, Marc Andreessen)
People trying to hold both views at once (Arvind Narayanan, Nicholas Carlini)

Replies

Read several classmates’ posts. Reply to at least one (~75-150 words):

Engage with their position, not just their articles. What would you push back on or add?
If you hold a similar view, steelman the strongest objection to your shared position.

Rubric

Two articles provided with well-formatted links and stance labels
Each article summarized compellingly, accurately, and briefly
Own position is articulated clearly and grounded in a named framework (not just “I think…”)
Position draws substantively on both articles (beyond “I agree with Article 1”)
Reply engages thoughtfully with a classmate’s reasoning
Writing is clear, concise, and well-cited

This covers both the RL unit and Human-Centered AI part 1.

RL

Different Approaches to RL

Main difference is what functions we learn:

Policy: Given a state, output (a distribution over) actions
Value: Given a state, output an expected return
- Common special case: Q: given a state and an action, output expected return
Model: Given a state and an action, output a likely next state.
new approach (Transformers-based): Trajectory: given a partial sequence of events (states, actions, and rewards), complete it.
- Often we start with a command – what got achieved (if anything) at the end of the trajectory.

What’s the “loss” (or target) in RL?

That’s what makes it hard! e.g,. in Q-learning we try to minimize the temporal difference: how much the reward we get differs from the reward we predict (by subtracting the next-state value function from the current-state value function). But that’s a difference of two predictions; if we were wrong, which of those two predictions was wrong?

In general, we’re hoping to learn something about all possible things that could happen and things we could do, given data about only a fraction of what happened and things we did.

Relationship to planning (e.g., minimax, Monte Carlo Tree Search)

They’re good at different things:

Classical planning algorithms:
- How to search among possible action sequences?
- Hard because of combinatorial explosion of possibilities.
- Made practical by heuristic values about individual states and actions
- Includes techniques like Beam Search
Learned RL algorithms:
- How to compute things about individual states and actions.
- Hard because we only get data about a few of all possible trajectories.
- Made practical by approximations.

So, unsurprisingly, the state of the art often combines both! See, e.g., MuZero.

Can an agent trained in simulation be trusted in the real world?

Hm. Pro:

We can set up things happening in the simulation that would be very rare or dangerous in the real world (e.g,. get an autonomous car to optimally reduce damage in a multi-vehicle collision in icy weather).
We can get a lot more data about how the agent behaves.

Con:

The simulation may not represent events that actually do happen, because the engineers never thought to program in a building with a tunnel painted on the side.
The agent may learn to exploit quirks in the simulator.

Do human newborns learn by RL?

Maybe somewhat, but not really:

RL is mostly about extrinsic motivation. Humans seem to have lots of intrinsic motivation.
RL usually studies agents in isolation (although that’s starting to change), but human learning is highly social.
RL usually works in environments that are much much simpler than the real world. e.g., the real world is richly multimodal.
RL is only as good as the function approximators it’s learning. Even the state-of-the-art are far less rich and capable than human learning.
etc. And we haven’t even started discussing consciousness, creativity, etc.

Interpretable AI

Why can we ever trust a model if we can’t see how it’s making its decisions?

We trust human doctors routinely, even though, despite decades of effort by cognitive scientists, we have very limited knowledge about the process by which people make their decisions.

Is there always a trade-off between understandability and accuracy?

No.

Interpretable models are easier to debug.
Interpretable models sometimes show us problems with the training data.
If the true, generalizable behavior is actually simple, an interpretable model may have better inductive bias.

So why are we only learning about this now? Good question…

What’s CART?

The classic algorithm for learning decision trees.

Other

What’s dropout?

Randomly setting some activations to 0.
Often helps models train by reducing overfitting (the model can’t overly depend on one specific thing because that might get dropped out)

How do you get bitwise determinism?

Careful handling of random state (e.g., seed)
Carefully write algorithms so everything happens in a deterministic order
Carefully…

The content may not be revised for this year. If you really want to see it, click the link above.

Contents