What about Colin Davison’s task made it “supervised learning”?
He gave the classifier examples of input-output pairs.
Why did he need to split his data?
So that he could evaluate how well the classifier would do on data it hadn’t seen.
What did he need to do to the text to make it usable by his classifier?
He turned each sentence into a vector.
Which of the following is a bigram?
“bi” is a character-level bigram.
“a bigram” is a word-level bigram
We summarized the difference between classical ML and deep learning as whether the feature extractor is programmed by hand or learned. (The classifier is the same.)