Unit 2: Data

Unit 2: Data

Students who complete this unit will demonstrate that they can:

Describe the structure of the data provided as input to an image classifier in terms of batches and labels

Contrast a training set and validation set; explain appropriate uses of both

Adapt Python code to load data to be used by an image classifier.

Implement basic numerical computing operations using PyTorch

Implement numerical computing operations on multidimensional arrays using PyTorch

Preparation

Read Deep Learning for Coders chapter 2 (open in Colab)

note: ignore the implementation details of class DataLoaders; you only need to know that it has a .train and a .valid, each of which are DataLoaders that turn filenames into batches of (image, label) pairs.
If you want to replicate the code, use duckduckgo (as the Unit 1 video does) instead of trying to get a Bing API key.

Start reading Deep Learning for Coders chapter 3 (open in Colab)

note: you can spread out this reading between this week and next week.

Watch FastAI Course Lecture 2

Complete the preparation quiz in Moodle.

Note: to get the Jupyter Notebook extensions that Jeremy discusses on the lab machines, run the following at a Terminal:

pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user

Optional Supplemental Material

Read the paper that introduced a well-known dataset: ImageNet, Microsoft COCO, MNIST, etc.

Class

Monday class

From Every Nation (Calvin College, 2004)
- Scripture: Revelation 7:9-10
- Vision (p.9)
  We envision a kingdom community in which cultural diversity is seen as normal; a Christian "family" that transcends ethnic, cultural, racial, and class boundaries: a communion of saints in which "each member should consider it his duty to use his gifts readily and cheerfully for the service and enrichment of the other members" (Lord's Day 21 of the Heidelberg Catechism); a community in which Reformed Christians from all of these groups see Calvin as their college. It is the biblical vision of Pentecost rather than the vision of Babel.
- Luxury of Remaining Neutral
- Reminder: January Series talk today (“To Stick with Love: King’s Vision for Today’s Movement”)
Examples of current racial justice concerns in AI
- Facial Recognition, e.g., Gender Shades
- Criminal Risk Assessment, e.g., COMPAS
Class logistics
- Guest lecture Wednesday
- How does a college class look different in an AI world, changing rapidly, saturated with info? My emphasis: activities in community
- How to find all the things: Units page on the class website
  - Make sure you’ve got the right year! (Google gets 2022 😢)
Review Lab 1
- How Fundamentals notebooks work
  - expanding on the best parts of last year
  - self-contained
    - Tasks (marked with “Task”)
    - blank code cells (labeled # your code here)
  - emphasize process over product
  - check-in quizzes on Moodle (for prose)
- getting set up on Kaggle
- Jupyter Notebooks
  - notebook = prose + code + output
  - interfaces for notebooks: Jupyter (classic and Lab), VS Code, Kaggle, Google Colab (view-only: github, nbviewer)
  - cell types
    - Markdown (GitHub Docs, spec)
    - Code
      - Each code block feeds input to a hidden Python repl (“Shell” in Thonny)
        
        Possible to run code out of order
        
        Changing something doesn’t make dependent code re-run!
      - Outputs: anything explicitly display()ed or print()ed or plot()ted—and the result of the last expression
- Model training
  - Outline of notebooks
    1. Load the data
      1. Download the dataset.
      2. Get a list of filenames.
      3. Get a list of ground-truth labels.
      4. Set up the dataloaders (which handles train-test split, batching, and resizing)
    2. Train a model
      1. Get a foundation model (resnet18 in our case)
      2. Fine-tune it.
    3. Get the model’s predictions on an image.
- Evaluating a model
  - Accuracy: correct or incorrect?
  - Loss:
    - partial credit
    - when it’s right, should be confident
    - when it’s wrong, shouldn’t be confident

Wednesday class

Guest lecture by Colin Davison.

The basic process he walked through is the same in essence as the image classification we started working on last week and are continuing with over the next few weeks.
Here are some questions that you should ask yourself to check your understanding of the lecture (nothing formal to turn in yet, but these might show up on a future quiz, so think and discuss…):
1. What about his task made it supervised learning?
2. Why did he need to split the data? (What happened when he didn’t split it?)
3. What did he need to do to the text to make it usable by his classifier? (bonus: in what sense was the input a “bag of words”)
4. Can you give an example of a bigram? A unigram?
5. How did he put a number on how well the classifier did? (bonus: what additional numbers did he show at the seminar? What insight did they provide?)

Preparation

Optional Supplemental Material

Class

Contents