class: left, top, title-slide .title[ # Predictive Analytics Unit 6: Ensemble Models ] .author[ ### Ken Arnold
Calvin University ] --- ## Objectives - Explain how decision trees and random forest classifiers generate predictions once trained. - Contrast the assumptions that linear models and tree-based models make about the data. - Identify which of the above classifiers can also be used for regression tasks. - Compare the appropriateness and empirical performance of two types of models. Note: textbook discusses various other model types also. We'll discuss some of these next week. Reference: [Fitting and Predicting with `parsnip`](https://parsnip.tidymodels.org/articles/articles/Examples.html) --- ## Setup Code ```r data(ames, package = "modeldata") ames_all <- ames %>% filter(Gr_Liv_Area < 4000, Sale_Condition == "Normal") %>% mutate(across(where(is.integer), as.double)) %>% mutate(Sale_Price = Sale_Price / 1000) rm(ames) ``` ```r metrics <- yardstick::metric_set(mae, mape, rsq_trad) set.seed(10) # Seed the random number generator ames_split <- initial_split(ames_all, prop = 2 / 3) ames_train <- training(ames_split) ames_test <- testing(ames_split) ``` --- class: center, middle # Random Forests --- ## Why random *forests*? .pull-left[ A shallow decision tree can't fit nuance in the data: ```r shallow_tree <- fit( decision_tree(mode = "regression", tree_depth = 5), Sale_Price ~ Latitude + Longitude, data = ames_train) ``` ```r show_latlong_model(ames_train, shallow_tree, model_name = NULL) ``` <img src="slides06ensembles_files/figure-html/shallow-tree-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ A deep tree fits both the nuance and the randomness. ```r deep_tree <- fit( * decision_tree(mode = "regression", tree_depth = 30, cost_complexity = 1e-6), Sale_Price ~ Latitude + Longitude, data = ames_train) ``` ```r show_latlong_model(ames_train, deep_tree, model_name = NULL) ``` <img src="slides06ensembles_files/figure-html/deep-tree-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Random Forest leverages diversity to offer a third way ```r forest1 <- fit( * rand_forest(mode = "regression"), Sale_Price ~ Latitude + Longitude, data = ames_train) ``` <img src="slides06ensembles_files/figure-html/model-errors-1.png" width="90%" style="display: block; margin: auto;" /> --- ## How Random Forests work - *ensemble* model: combines predictions of many models - A random *forest* has many *trees* - Each tree is fit on a different sample of data - To make a prediction: ask each tree for its prediction, average them (for regression) or have them vote (for classification) --- ## A random forest has many trees ```r forest_internals <- extract_fit_engine(forest1) forest_internals ``` ``` Ranger result Call: ranger::ranger(x = maybe_data_frame(x), y = y, num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1)) Type: Regression Number of trees: 500 Sample size: 1608 Number of independent variables: 2 Mtry: 1 Target node size: 5 Variable importance mode: none Splitrule: variance OOB prediction error (MSE): 1446.128 R squared (OOB): 0.689077 ``` --- ## A random forest has many trees Each tree is deep, so here we only show a few example paths through the tree: | nodeID| leftChild| rightChild| splitvarID|splitvarName | splitval|terminal | prediction| |------:|---------:|----------:|----------:|:------------|--------:|:--------|----------:| | 0| 1| 2| 0|Latitude | 42.04613|FALSE | NA| | 1| 3| 4| 0|Latitude | 42.01889|FALSE | NA| | 2| 5| 6| 0|Latitude | 42.05710|FALSE | NA| | 3| 7| 8| 0|Latitude | 41.99294|FALSE | NA| | 5| 11| 12| 0|Latitude | 42.04641|FALSE | NA| | 8| 17| 18| 0|Latitude | 41.99305|FALSE | NA| | 11| 23| 24| 0|Latitude | 42.04614|FALSE | NA| | 17| NA| NA| NA|NA | NA|TRUE | 425| | 23| NA| NA| NA|NA | NA|TRUE | 301| --- ## Forest averages each tree's predictions <img src="slides06ensembles_files/figure-html/example-rf-predictions-1.png" width="80%" style="display: block; margin: auto;" /> .floating-source[...for regression. Classification: take a vote.] --- ## Each tree was trained on a different sample of data - Each tree was fit on a *bootstrap resample* of the original data - So it's like a sample statistic (the tree summarizes the data) - Intuition: Predict by averaging over the sampling distribution. Also: each *split* only allowed to consider a random subset of features. So different trees can't always use the same feature even if it seems important. --- ## Value of Diversity .scripture[ ``` I looked and there before me was a great multitude that no one could count, from every nation, tribe, people and language, standing before the throne. ``` .ref[Revelation 7:9, as quoted in Calvin's "From Every Nation"] ] - Random Forests work because they combine diverse perspectives (from different training data, different choices) - Reflects value of diversity in God's Kingdom (see also Rev 5:9, 1 Cor 12, etc.) --- ## Gradient Boosted Trees (XGBoost, LightGBM) Instead of averaging random trees, *add* trees, each trying to predict the error from the model so far. ```r xgb1 <- fit( * boost_tree(mode = "regression"), Sale_Price ~ Latitude + Longitude, data = ames_train) ``` <img src="slides06ensembles_files/figure-html/model-errors-with-boost-1.png" width="90%" style="display: block; margin: auto;" /> --- .pull-left[ Random Forest: ```r show_latlong_model(ames_train, forest1, model_name = NULL) ``` <img src="slides06ensembles_files/figure-html/rf-model-data-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ Boosted tree (XGBoost): ```r show_latlong_model(ames_train, xgb1, model_name = NULL) ``` <img src="slides06ensembles_files/figure-html/boost-model-data-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Applying Random Forests and Gradient Boosting - Regression and classification? **Both**, just like decision trees. - Classification: trees "vote" for most probable class - Gradient boosting works slightly differently inside for classification - Categorical features? **Yes**, just like decision trees. - Does scale of numerical features matter? **No** (just like trees) Random Forest: A good first model to try. Then try XGBoost, may need *tuning*. Caveat: - Average of trees with the same **Core assumption**: split up items into boxes; treat everything in a box as the same. - So: can't *continue a trend* (without help) - Gradient Boosting can overfit (try a Random Forest first) Also: all models so far fail if feature meanings change (e.g., individual pixels in an image)