Glossary

Warning: This content has not yet been fully revised for this year.

Data

Tabular Data

Text Data

Padding issues

Sentences have different lengths. But models generally need rectangular inputs, so everything has to be the same length. What do we do? pad inputs to be the right length.

Classification Metrics

Error Rate

Sensitivity and Specificity

Sensitivity/Specificity / Confusion Matrix on Wikipedia

Area Under the Curve (AUC)

Training Stuff

Epoch

(Mini-)Batch

A collection of data points that are processed together. Using bigger batches can lead to more efficient processing because more work can be done in parallel, and can sometimes stabilize training, but it gives the network fewer chances to update itself per epoch, and batches that are too large may harm generalization unless mitigation measures are taken.

Not to be confused with “training set” or “validation set”, both of which are processed in batches.

Stochastic Gradient Descent

An algorithm for finding parameters to a function that are in the neighborhood of a shallow local minimum.

Algorithm:

Input: a dataset, a model architecture, and some hyperparameters:

Steps:

For the math, and extra details like the momentum parameter, see, e.g., the PyTorch SGD docs.

Neural Net Layers

A neural net = weighted connections (typically linear layers) and activation functions.

Linear

Outputs are linear transformations of inputs.

Convolutional Neural Network

Softmax

Loss functions

Resources:

Related: perplexity:

Embeddings

Definition:

Embeddings can be learned by gradient descent.

Tasks

Classification vs Regression

Regression:

Named-Entity Recognition

Pick out things with names: people, places, organizations, etc. Sometimes includes years. Useful for when you want to identify what a sentence or article is “talking about”.

Acknowledgements

This material reflects contributions from some past students including Esther Asuquo. It also includes text generated by GitHub Copilot.

How to Demonstrate Objectives
Forum Posts 22SP