Resources

Glossary

Here’s a Glossary of some terms we’ve used in class. Please suggest additional terms to add!

`borg` Supercomputer

If you need more computing power or storage than the lab machines for your final projects, you can run on Borg.

To set up your environment, run /storage/ArnoldGroup/anaconda3/bin/conda init, then log off and log back on.

Pro tip for accessing borg: put the following in ~/.ssh/config
Host borg
    Hostname borg.calvin.edu
    Port 22122
    User YOUR_USERNAME
    IdentityFile ~/.ssh/keys/borg OR WHEREVER YOUR KEY IS

The easiest way to use Borg is by running a training script from the command line. For example, you could make an sbatch script like this:

#!/bin/bash
#
# Run on the GPU node with one GPU, 4 CPUS, and 64GB of RAM
# Reserve one GPU
#SBATCH --gres=gpu:1
#
# Reserve four CPU cores
#SBATCH -c 4
#
# Reserve 64GB of RAM
#SBATCH --mem=64G
#
# Run in the GPUs queue
#SBATCH -p gpus

echo -n "Starting at "
date
echo "GPU info:"
nvidia-smi
which python
python -c 'import torch; print("PyTorch", torch.__version__, "CUDA=", torch.cuda.is_available())'

# Call the script here:
python training_script.py --input-data=... --batch-size=...

echo -n "Done at "
date

Then run it like sbatch train_model.sbatch.

If you need a notebook, please don’t use the GPU node, because we only have 4 GPUs to share among all of us. Instead, use a compute node. Here’s how to do it (could be more tested… let me know how it goes):

Get on a Linux machine (lab machine) unless you know what you’re doing.
Set up an ssh keypair, where the lab machine has your private key and borg has the public key listed in ~/.ssh/authorized_keys. Make sure permissions are set correctly: chmod 700 ~/.ssh; chmod 600 ~/.ssh/authorized_keys
Get a shell on a compute note: on borg, run srun -c 2 --mem=64G --pty bash.
Note what machine you’re on, e.g., borg-node01 (the .calvin.edu part is optional).
Run jupyter notebook --no-browser.
On a second terminal on the lab machine, run ssh -L 8888:borg-node01:8888 borg. (replace the node name appropriately)

There’s a chance this might not work. If it fails, then in that second terminal instead just ssh borg, and then from there, ssh -L8888:127.0.0.1:8888 borg-node01 or whatever node it is. Also, if you get a port number other than 8888 (because something else is using that port), use that port number in the commands above instead.

Alternative: ssh -J username@borg-node01 -L 8888:localhost:8888 borg.
Back in the first terminal, copy and paste the link that jupyter notebook gave you into your web browser.

General instructions for using the Slurm scheduler on Borg

fastai hotfixes

Warning: fastai drops incomplete batches in the training set, and bs=1 would fail because of batch normalization. So use bs = 2 for small data. (And more epochs.)

plot_top_losses is broken. Here’s a monkey-patch; add this as a new code cell just after your fastai imports:

def _plot_top_losses(self, k, largest=True, **kwargs):
    losses,idx = self.top_losses(k, largest)
    if not isinstance(self.inputs, tuple): self.inputs = (self.inputs,)
    if isinstance(self.inputs[0], Tensor): inps = tuple(o[idx] for o in self.inputs)
    else: inps = self.dl.create_batch(self.dl.before_batch([tuple(o[i] for o in self.inputs) for i in idx]))
    b = inps + tuple(o[idx] for o in (self.targs if is_listy(self.targs) else (self.targs,)))
    x,y,its = self.dl._pre_show_batch(b, max_n=k)
    b_out = inps + tuple(o[idx] for o in (self.decoded if is_listy(self.decoded) else (self.decoded,)))
    x1,y1,outs = self.dl._pre_show_batch(b_out, max_n=k)
    if its is not None:
        plot_top_losses(x, y, its, outs.itemgot(slice(len(inps), None)), self.preds[idx], losses,  **kwargs)
ClassificationInterpretation.plot_top_losses = _plot_top_losses

Running Code

Google Colab

In addition to the lab computers, you can run all the book’s notebooks, and most of our labs and homeworks, on Google Colab. Links to open notebooks in Colab are given next to each reading link.

To install fastai, insert a cell at the top that contains:

!pip install -Uq fastbook
from fastbook import *

(In the past it was also necessary to install torchtext==0.8.1. I suspect this is no longer required but I have not tested that.)

Tips and Notes:

Under the Runtime menu, select Change runtime type and select GPU. Otherwise many things will run very slowly.
If you open a notebook from GitHub, any changes are not saved! Make sure you select “Copy to Drive” on the toolbar if you want to save changes.
intro, overview
Press Ctrl-Shift-P to open the Command Palette. Lots of useful stuff there; try the “scratch code cell”.
Click the down arrow next to the RAM/Disk meter in the toolbar (where it used to say “Connect”) and select “show executed code history”.

Jupyter Notebook

Appendix from our book
some tips and tricks

Materials

fastai

Our book: Deep Learning for Coders
- Source notebooks, Arnold’s cleaned notebooks. Suggestion: use nbviewer (or Colab) when reading the notebooks, rather than GitHub.
fast.ai course lesson videos
Review the end-of-chapter questions at aiquizzes
https://www.cognitivefactory.fr/fastaidocs/
https://walkwithfastai.com/

Educational Materials

Deep Learning book by Goodfellow et al.
ISLR book and course
MIT 6.S191 Introduction to Deep Learning
Spring 2021 Videos for this class
- Fundamentals 009 (Linear Regression with Learner) walkthrough
- Lab 3 walkthrough
Other Videos
- 3Blue1Brown Neural Network Videos
- Art of the Problem - Deep Learning Playlist: “Mathematics of Neural Networks” is a nice explanation of how neural nets use vector spaces.
“Advice on getting started in deep learning” by Andrew Ng

Community

Tools

GitHub’s Git Handbook
Training Tips
- https://huggingface.co/blog/simple-considerations
- http://josh-tobin.com/assets/pdf/troubleshooting-deep-neural-networks-01-19.pdf
- A Recipe for Training Neural Networks](http://karpathy.github.io/2019/04/25/recipe/)
- https://blog.floydhub.com/training-neural-nets-a-hackers-perspective/>
RunwayML (very high-level interface)
Streamlit
WandB (experiment tracking)

Keeping up with AI

Tech

TWIML – podcast, blog
Papers With Code (sign up for the newsletter)
Two Minute Papers YouTube channel
Morning Paper blog
Arxiv-Sanity (but still insane)
distill.pub
Harvard Data Science Review

Ethics / Society

Fast.AI Data Ethics course
Montreal AI
AI Now Institute
Data and Society
The Oxford Handbook of Ethics of AI
AlgorithmWatch
Harvard BKC
ACM Conference on Fairness, Accountability, and Transparency (FAccT)