Q&A Week 2

Tech

Why do we get two different error rates from fine_tune?

The training happens in two stages. You’ll learn more about this process in later chapters. For now, use the last error rate.

Why do I get different results when training the same model multiple times, even though I set the seed?

Nobody actually asked this, but it bugged me (based on my expectations from sklearn) so I looked into it.

From looking at ImageDataLoaders.from_path_func??, the seed parameter only controls the RandomSplitter (i.e., the split between training set and validation set). So passing a seed should ensure that the same images make it into the training set vs validation set each time, which is a really good idea.

To make a fastai training reproducible, call set_seed(12345, reproducible=True) before creating the dataloader. That function seeds Python’s standard library random, numpy.random, and PyTorch’s RNG.

(I eventually found this discussed in the fastai issue tracker. But before I did, I poked around at the code. So: DataLoaders are iterators, so the dls.train.__iter__?? code is what gets run when you iterate through it. Notice that it starts with self.randomize(), which creates a fresh self.rng from its previous RNG. And if you look at the definition of DataLoader?? (github link), self.rng is created by calling random.Random.)

Can we get 100% accurate AI?

Depends on what you mean. Keywords to search for if you want to look more into this:

  • Verified AI
  • robust machine learning
  • robust reinforcement learning

Context

Will unbiased data prevent biased decisions?

Unfortunately, no. See this thread for a survey:

How can I make sure that my AI project is beneficial?

Hard question. Here’s one paper that suggests a set of questions to ask.

Ken Arnold
Ken Arnold
Assistant Professor of Computer Science