Neural Computation:
ML Systems:
Learning Machines
Context and Implications
Some ideas are up on the course website.
On Perusall (graded by participation: watch/read it all, write a few good comments)
Tell me a joke.
Q: What don't scientists trust atoms?
A: They ___
What is the first word that goes in the blank?
neighs and rhymes with course
Sketch a line plot of what you think the outdoor air temperature will be over the next 3 days. (x axis is timestamp, y axis is temperature).
Then, on the same axes, sketch several alternative temperature plots for the same week.
How would you quantify how accurate your distribution is?
Definition: a language model is a probability distribution over sequences of tokens
“A teddy bear on a skateboard in Times Square.”
A conditional distribution is a probability distribution of one variable or set of variables given another.
The one who has knowledge uses words with restraint,
and whoever has understanding is even-tempered.
Even fools are thought wise if they keep silent,
and discerning if they hold their tongues.
Examples:
Write the joint probability as a product of conditional probabilities:
P(tell, me, a, joke) = P(tell) * P(me | tell) * P(a | tell, me) * P(joke | tell, me, a)
A causal language model gives P(word | prior words)
Task: given what came so far, predict the next thing
P(word | context) = softmax(wordLogits(word, context))
Do you recognize this structure?
The LM only gives us a distribution over the next token. How do we generate a sequence?
Sampling strategies:
(This is not specific to ML.)
Internally, what’s this doing?
cumulative_probs = np.cumsum(probs)
r = np.random.uniform()
for i, cp in enumerate(cumulative_probs):
if r < cp:
return iExample: Generation Activity
word was the right guess.Intuition: coin flip has perplexity 2; fair 6-sided die has perplexity 6.
What shall I return to the Lord
for all his goodness to me?
I will lift up the cup of salvation
and call on the name of the Lord.
See also James 1:16-18
Neural nets work with numbers. How do we convert text to numbers that we can feed into our models?
Neural nets give us numbers as output. How do we go back from numbers into text?
Two parts:
An example: https://platform.openai.com/tokenizer
(The “Ġ” is an internal detail to GPT-2; ignore it for now.)
Some figures from Prince, Understanding Deep Learning, 2023