And all stereotypes that can be expressed in writing
ML System for Language Modeling
The neural computer takes sequences of vectors and outputs vectors.
To use it for generating language, we need a “driver” program that:
Turns examples into sequences of numbers (tokenizer)
Runs the neural computer (the “model”) on that sequence
Interprets the output as a probability distribution over next tokens, samples one of them, and adds it onto the sequence so far
In general:
Classical computers: orchestration and control flow
Neural computers: parallel vector operations
Sequential and Neural Computer: Training
Sequential and Neural Computer: Inference
Objectives
Compare and contrast the main types of deep neural network models (Transformers, Convolutional Networks, and Recurrent Networks) in terms of how information flows through them
Deep Neural Net = stack of layers
Modular components, often connected sequentially
Linear transformation (“Dense”, “fully connected”)
multiple linear layers: MLP / “Feed-forward”
Convolution and Pooling
Self-Attention
Recurrent (RNN, LSTM)
Normalization (BatchNorm, LayerNorm)
Dropout
An Oversimplified History of Neural Architectures
Connectivity Structure
Fully Connected
Perceptron: single layer, or hidden layers (MLP)
Fixed Connections
Convolutional networks (CNN): local connections
Recurrent networks (RNN): remember what was seen before (temporally connected?)