Glossary

Data

Tabular Data

Text Data

Padding issues

Sentences have different lengths. But models generally need rectangular inputs, so everything has to be the same length. What do we do? pad inputs to be the right length.

Classification Metrics

Error Rate

Sensitivity and Specificity

Sensitivity/Specificity / Confusion Matrix on Wikipedia

Area Under the Curve (AUC)

Training Stuff

Epoch

(Mini-)Batch

A collection of data points that are processed together. Using bigger batches can lead to more efficient processing because more work can be done in parallel, and can sometimes stabilize training, but it gives the network fewer chances to update itself per epoch, and batches that are too large may harm generalization unless mitigation measures are taken.

Not to be confused with “training set” or “validation set”, both of which are processed in batches.

Stochastic Gradient Descent

An algorithm for finding parameters to a function that are in the neighborhood of a shallow local minimum.

Algorithm:

Input: a dataset, a model architecture, and some hyperparameters:

Steps:

For the math, and extra details like the momentum parameter, see, e.g., the PyTorch SGD docs.

Neural Net Layers

Linear

Softmax

Loss functions

MSE

MAE

Cross-Entropy Loss

Resources:

Related: perplexity:

Embeddings

An embedding is:

Tasks

Named-Entity Recognition

Pick out things with names: people, places, organizations, etc. Sometimes includes years. Useful for when you want to identify what a sentence or article is “talking about”.

Project Scratch
Fundamentals Index