Slides – CS 375: Wrap-Up

Core Concepts: The Four Pillars

Neural Computation

Tensors as universal data structure
Linear transformations + nonlinearities
Learning via gradient descent

ML Systems

Data pipelines: preprocessing → model → evaluation
Abstractions: fit/predict APIs
Systematic evaluation & generalization

Learning Machines

Supervised learning: learning from examples
Unsupervised learning: finding patterns
Training, testing, and the generalization gap

Context & Implications

What AI can vs. should solve
Limitations: correlation ≠ causation
AI in service of human flourishing

Neural Computation: The Core

Traditional vs. Neural computing
- Traditional: Explicit instructions → Outputs
- Neural: Data + Parameters + Architecture → Learned mapping
Building blocks
- Tensors (arrays) as fundamental data structure
- Linear layers transform data
- Nonlinearities (ReLU) add conditional logic
- Gradient descent to adjust parameters

ML Systems: Connecting to the World

From raw data to predictions
- Input transformation to structured tensors
- Task-appropriate outputs and metrics
- Systematic evaluation (train/val/test)
Key abstractions
- Data pipelines with clear stages
- Common API patterns
- Hyperparameters vs. learned parameters

Learning Machines: Improving from Experience

Learning paradigms
- Supervised: Mimicry from examples
- Unsupervised: Pattern discovery without labels
- Reinforcement: Learning from interaction and reward signals
Error sources
- Underfitting: Can’t represent training data well
- Overfitting: Can’t generalize beyond training
- Data issues: Biased or shifting distributions
- Task misspecification: Optimizing the wrong thing

A neural computer with a program to feed it data and optimize parameters becomes a learning machine that improves with experience.

Supervised learning is mimicry: from examples, the machine learns to do likewise, for better or worse. Unsupervised learning finds patterns or relationships without known “right answers,” making success hard to measure.

Reinforcement learning combines supervised learning (evaluating actions in situations) with interactive exploration (taking actions to collect data about unseen situations).

Errors in real-world tasks come from: (1) underfitting - inability to represent training data well, (2) the generalization gap - inability to extend beyond training examples, (3) data shortcomings - e.g., assuming static conditions in a changing world, and (4) task misspecification - optimizing for the wrong objective.

Context & Implications: The Bigger Picture

Possibilities and limitations
- What problems can AI solve? Desk tasks with clear metrics
- What should we use AI for? Love and service, not just efficiency
Current limitations
- Correlation vs. causation
- Limited real-world interaction
- Fixation on numeric metrics

Going Deeper: Neural Computation

What we’ve seen
- Basic building blocks: vectors, matrices, tensors
- Linear layers and activation functions
- Simple architectures (MLPs)
- Gradient descent as learning mechanism
What we haven’t seen
- Deep networks with many layers
- CNNs, RNNs, Transformers
- Backpropagation internals
- Advanced optimizers (Adam, etc.)

We’ve studied the fundamental building blocks, but haven’t explored:

Deep networks with many hidden layers that can represent more complex computations but require additional training techniques
Specialized architectures: CNNs (for spatial/temporal data), RNNs (adding memory), and Transformers (incorporating information through queries)
The internal workings of backpropagation (which applies the chain rule of calculus)
Optimizers beyond SGD, like Adam, which adaptively adjusts learning rates based on parameter update history

In the broader context, image classification networks use the same basic structure but add convolutional layers for efficient pattern extraction. Large language models store information in multilayer perceptrons similar to what we’ve studied, but add attention mechanisms allowing word context to shape processing. Ultimately, an LLM is a sophisticated classifier predicting the next token in a sequence.

Going Deeper: ML Systems

What we’ve seen
- Classification and regression tasks
- Input transformations and batching
- Performance metrics and evaluation
- Hyperparameter tuning
What we haven’t seen
- Scaling models and abstractions to LLMs
- Commercial APIs for embeddings
- Converting real-world problems to ML tasks

Going Deeper: Learning Machines

What we’ve seen
- Basic supervised and unsupervised approaches
- Generalization concepts
- Error analysis and debugging
What we haven’t seen
- Self-supervised learning
- Training at massive scale
- RLHF for generative models
- Scale as regularization

Going Deeper: Context & Implications

Questions we’ve explored
- AI capabilities vs. appropriate uses
- Evaluation beyond metrics
- Ethical considerations and impacts
Questions we’ll continue exploring
- How AI systems align with human values
- Navigating benefits and risks in deployment
- Cultivating wisdom in technological development

Connections: Modern AI Systems

Image classifiers
- Same basic structure as our simple networks
- Addition of convolutional layers for pattern extraction
- Hierarchical feature learning
Large Language Models
- At core: fancy classifier over next token
- Feature extractors + linear/softmax layers
- Addition of attention mechanisms for context
- Self-supervised learning at massive scale

Looking Forward to CS 376

Models for structured objects
- Images, text, multimodal inputs
Advanced architectures
- CNN, RNN, Transformers
Agent-based approaches
- Advanced use of LLMs
- Reinforcement learning for language/agents
- Tool use and planning

This Course: Reflecting Together

We’ve prototyped education in an AI-pervasive world
What worked? What didn’t?
What did you appreciate?
How might we think about the value of learning communities?