Warning: This content has not yet been fully revised for this year.
Glossary
Here’s a Glossary of some terms we’ve used in class. Please suggest additional terms to add!
Coding Tips
- General: How to be a Wizard
- Learning Python
- Debugging
- A debugging manifesto
- Longer version: The Pocket Guide to Debugging
- Python specific: Read the end of the traceback.
- A debugging manifesto
Computing
To run a Hugging Face model on the lab machines on GPU, put the following in a Dockerfile in a new directory and follow the instructions in the comments.
# build with `sudo docker build -t transformers-pytorch-gpu-streamlit .`
# Note: the period at the end is important
# run with `sudo docker run -it --gpus all -p 8501:8501 -p 8888:8888 -v /scratch:/scratch -v ~:/home transformers-pytorch-gpu-streamlit bash`
# this:
# - runs the container interactively
# - uses all GPUs
# - maps port 8501 on the host to port 8501 in the container
# - mounts /scratch on the host to /scratch in the container
# If you want to run a Jupyter notebook, you can also map port 8888 in the same way:
# `sudo docker run -it --gpus all -p 8888:8888 -v /scratch:/scratch transformers-pytorch-gpu-streamlit jupyter notebook --allow-root --ip 0.0.0.0`
# You might want to allow the container to access your home directory, which you can do with `-v ~:/home`. But you'll probably need to have the container run as your user, which you can do with `-u $(id -u):$(id -g)`
FROM huggingface/transformers-pytorch-gpu
RUN pip install streamlit jupyter bitsandbytes
RUN apt install wget; \
wget https://vscode.download.prss.microsoft.com/dbazure/download/stable/e170252f762678dec6ca2cc69aba1570769a5d39/vscode_cli_alpine_x64_cli.tar.gz; \
tar zxvf vscode_cli_alpine_x64_cli.tar.gz
# set the env var to make Hugging Face use the /scratch
ENV HF_HOME /scratch/cs344/huggingface
borg Supercomputer
borg SupercomputerIf you need more computing power or storage than the lab machines for your final projects, you can run on Borg.
To set up your environment, run /storage/ArnoldGroup/anaconda3/bin/conda init, then log off and log back on.
Pro tip for accessing
borg: put the following in~/.ssh/configHost borg Hostname borg.calvin.edu Port 22122 User YOUR_USERNAME IdentityFile ~/.ssh/keys/borg OR WHEREVER YOUR KEY IS
The easiest way to use Borg is by running a training script from the command line. For example, you could make an sbatch script like this:
#!/bin/bash
#
# Run on the GPU node with one GPU, 4 CPUS, and 64GB of RAM
# Reserve one GPU
#SBATCH --gres=gpu:1
#
# Reserve four CPU cores
#SBATCH -c 4
#
# Reserve 64GB of RAM
#SBATCH --mem=64G
#
# Run in the GPUs queue
#SBATCH -p gpus
echo -n "Starting at "
date
echo "GPU info:"
nvidia-smi
which python
python -c 'import torch; print("PyTorch", torch.__version__, "CUDA=", torch.cuda.is_available())'
# Call the script here:
python training_script.py --input-data=... --batch-size=...
echo -n "Done at "
date
Then run it like sbatch train_model.sbatch.
If you need a notebook, please don’t use the GPU node, because we only have 4 GPUs to share among all of us. Instead, use a compute node. Here’s how to do it (could be more tested… let me know how it goes):
-
Get on a Linux machine (lab machine) unless you know what you’re doing.
-
Set up an ssh keypair, where the lab machine has your private key and borg has the public key listed in
~/.ssh/authorized_keys. Make sure permissions are set correctly:chmod 700 ~/.ssh; chmod 600 ~/.ssh/authorized_keys -
Get a shell on a compute note: on borg, run
srun -c 2 --mem=64G --pty bash. -
Note what machine you’re on, e.g.,
borg-node01(the .calvin.edu part is optional). -
Run
jupyter notebook --no-browser. -
On a second terminal on the lab machine, run
ssh -L 8888:borg-node01:8888 borg. (replace the node name appropriately)There’s a chance this might not work. If it fails, then in that second terminal instead just
ssh borg, and then from there,ssh -L8888:127.0.0.1:8888 borg-node01or whatever node it is. Also, if you get a port number other than 8888 (because something else is using that port), use that port number in the commands above instead.Alternative:
ssh -J username@borg-node01 -L 8888:localhost:8888 borg. -
Back in the first terminal, copy and paste the link that
jupyter notebookgave you into your web browser.
General instructions for using the Slurm scheduler on Borg
Google Colab
Tips and Notes:
- If (and only if) you’re working with images: Under the Runtime menu, select Change runtime type and select GPU. Otherwise many things will run very slowly.
- If you open a notebook from GitHub, any changes are not saved! Make sure you select “Copy to Drive” on the toolbar if you want to save changes.
- intro, overview
- Press Ctrl-Shift-P to open the Command Palette. Lots of useful stuff there; try the “scratch code cell”.
- Click the down arrow next to the RAM/Disk meter in the toolbar (where it used to say “Connect”) and select “show executed code history”.
Jupyter Notebook
- Appendix from our book
- some tips and tricks
- If you’re using git on the command line (instead of VS Code), you may appreciate
nbdiff.
Materials
- Text and Image Generation – short and very clear YouTube videos
- 3blue1brown Linear Algebra articles with excellent videos
Educational Materials
- Deep Learning book by Goodfellow et al.
- ISLR book and course
- MIT 6.S191 Introduction to Deep Learning
- Spring 2021 Videos for this class
- Fundamentals 009 (Linear Regression with
Learner) walkthrough - Lab 3 walkthrough
- Fundamentals 009 (Linear Regression with
- Other Videos
- 3Blue1Brown Neural Network Videos
- Art of the Problem - Deep Learning Playlist: “Mathematics of Neural Networks” is a nice explanation of how neural nets use vector spaces.
- “Advice on getting started in deep learning” by Andrew Ng
- Deliberate practice. It’s prerequisite to expertise. https://youtu.be/5eW6Eagr9XA?t=936 — I’ll add: those four things are exactly what is needed for an AI to do something well also.
Tools
- Git and GitHub
- Training Tips
- RunwayML (very high-level interface)
- Streamlit
- WandB (experiment tracking)
Keeping up with AI
Tech
- TWIML – podcast, blog
- Papers With Code (sign up for the newsletter)
- Two Minute Papers YouTube channel
- Arxiv-Sanity (but still insane)
- distill.pub
- Harvard Data Science Review
Ethics / Society
- Fast.AI Data Ethics course
- Montreal AI
- AI Now Institute
- Data and Society
- The Oxford Handbook of Ethics of AI
- Harvard BKC
- ACM Conference on Fairness, Accountability, and Transparency (FAccT)