Predictive Analytics Unit 6: Ensemble Models

class: left, top, title-slide

.title[
# Predictive Analytics Unit 6: Ensemble Models
]
.author[
### Ken Arnold<br>Calvin University
]

---

## Objectives

-   Explain how decision trees and random forest classifiers generate predictions once trained.
-   Contrast the assumptions that linear models and tree-based models make about the data.
-   Identify which of the above classifiers can also be used for regression tasks.
-   Compare the appropriateness and empirical performance of two types of models.

Note: textbook discusses various other model types also. We'll discuss some of these next week.

Reference: [Fitting and Predicting with `parsnip`](https://parsnip.tidymodels.org/articles/articles/Examples.html)

---

## Setup Code

```r
data(ames, package = "modeldata")
ames_all <- ames %>%
  filter(Gr_Liv_Area < 4000, Sale_Condition == "Normal") %>%
  mutate(across(where(is.integer), as.double)) %>%
  mutate(Sale_Price = Sale_Price / 1000)
rm(ames)
```

```r
metrics <- yardstick::metric_set(mae, mape, rsq_trad)

set.seed(10) # Seed the random number generator
ames_split <- initial_split(ames_all, prop = 2 / 3)
ames_train <- training(ames_split)
ames_test <- testing(ames_split)
```

---

class: center, middle

# Random Forests

---

## Why random *forests*?

.pull-left[
A shallow decision tree can't fit nuance in the data:

```r
shallow_tree <- fit(
  decision_tree(mode = "regression", tree_depth = 5),
  Sale_Price ~ Latitude + Longitude,
  data = ames_train)
```

```r
show_latlong_model(ames_train, shallow_tree, model_name = NULL)
```

]

.pull-right[

A deep tree fits both the nuance and the randomness.

```r
deep_tree <- fit(
* decision_tree(mode = "regression", tree_depth = 30, cost_complexity = 1e-6),
  Sale_Price ~ Latitude + Longitude,
  data = ames_train)
```

```r
show_latlong_model(ames_train, deep_tree, model_name = NULL)
```

]

---

## Random Forest leverages diversity to offer a third way

```r
forest1 <- fit(
* rand_forest(mode = "regression"),
  Sale_Price ~ Latitude + Longitude,
  data = ames_train)
```

---

## How Random Forests work

- *ensemble* model: combines predictions of many models
- A random *forest* has many *trees*
- Each tree is fit on a different sample of data
- To make a prediction: ask each tree for its prediction, average them (for regression) or have them vote (for classification)

---

## A random forest has many trees

```r
forest_internals <- extract_fit_engine(forest1)
forest_internals
```

```
Ranger result

Call:
 ranger::ranger(x = maybe_data_frame(x), y = y, num.threads = 1,      verbose = FALSE, seed = sample.int(10^5, 1))

Type:                             Regression 
Number of trees:                  500 
Sample size:                      1608 
Number of independent variables:  2 
Mtry:                             1 
Target node size:                 5 
Variable importance mode:         none 
Splitrule:                        variance 
OOB prediction error (MSE):       1446.128 
R squared (OOB):                  0.689077 
```

---

## A random forest has many trees

Each tree is deep, so here we only show a few example paths through the tree:

| nodeID| leftChild| rightChild| splitvarID|splitvarName | splitval|terminal | prediction|
|------:|---------:|----------:|----------:|:------------|--------:|:--------|----------:|
|      0|         1|          2|          0|Latitude     | 42.04613|FALSE    |         NA|
|      1|         3|          4|          0|Latitude     | 42.01889|FALSE    |         NA|
|      2|         5|          6|          0|Latitude     | 42.05710|FALSE    |         NA|
|      3|         7|          8|          0|Latitude     | 41.99294|FALSE    |         NA|
|      5|        11|         12|          0|Latitude     | 42.04641|FALSE    |         NA|
|      8|        17|         18|          0|Latitude     | 41.99305|FALSE    |         NA|
|     11|        23|         24|          0|Latitude     | 42.04614|FALSE    |         NA|
|     17|        NA|         NA|         NA|NA           |       NA|TRUE     |        425|
|     23|        NA|         NA|         NA|NA           |       NA|TRUE     |        301|

---

## Forest averages each tree's predictions

.floating-source[...for regression. Classification: take a vote.]
---

## Each tree was trained on a different sample of data

- Each tree was fit on a *bootstrap resample* of the original data
- So it's like a sample statistic (the tree summarizes the data)
- Intuition: Predict by averaging over the sampling distribution.

Also: each *split* only allowed to consider a random subset of features. So different trees can't always use the same feature even if it seems important.

---

## Value of Diversity

.scripture[
```
I looked and there before me
was a great multitude that no one could count,
from every nation, tribe, people and language,
standing before the throne.
```

.ref[Revelation 7:9, as quoted in Calvin's "From Every Nation"]
]

- Random Forests work because they combine diverse perspectives (from different training data, different choices)
- Reflects value of diversity in God's Kingdom (see also Rev 5:9, 1 Cor 12, etc.)

---

## Gradient Boosted Trees (XGBoost, LightGBM)

Instead of averaging random trees, *add* trees, each trying to predict the error from the model so far.

```r
xgb1 <- fit(
* boost_tree(mode = "regression"),
  Sale_Price ~ Latitude + Longitude,
  data = ames_train)
```

---

.pull-left[
Random Forest:

```r
show_latlong_model(ames_train, forest1, model_name = NULL)
```

<img src="slides06ensembles_files/figure-html/rf-model-data-1.png" width="100%" style="display: block; margin: auto;" />
]

.pull-right[
Boosted tree (XGBoost):

```r
show_latlong_model(ames_train, xgb1, model_name = NULL)
```

<img src="slides06ensembles_files/figure-html/boost-model-data-1.png" width="100%" style="display: block; margin: auto;" />
]

---

## Applying Random Forests and Gradient Boosting

- Regression and classification? **Both**, just like decision trees.
  - Classification: trees "vote" for most probable class
  - Gradient boosting works slightly differently inside for classification
- Categorical features? **Yes**, just like decision trees.
- Does scale of numerical features matter? **No** (just like trees)

Random Forest: A good first model to try. Then try XGBoost, may need *tuning*.

Caveat:

- Average of trees with the same **Core assumption**: split up items into boxes; treat everything in a box as the same.
- So: can't *continue a trend* (without help)
- Gradient Boosting can overfit (try a Random Forest first)

Also: all models so far fail if feature meanings change (e.g., individual pixels in an image)