class: left, top, title-slide .title[ # Predictive Analytics Unit 2: Linear Foundations ] .author[ ### Ken Arnold
Calvin University ] --- class: middle # Objectives - Explain how a linear regression model computes a prediction. - Interpret the results of a linear regression that includes both quantitative and categorical variables. --- # Linear Models Basic structure: sum of scaled values. "They're talking about a linear model": - Price per gallon: gas bill = (price per gallon) * (gallons) - Cost per night: hotel bill = (weekday rate) * (num weekdays) + (weekend rate) * (num weekends) - Monthly cost of groceries = (cost per adult) * (number of adults) + (cost per child) * (number of children) + cost of transportation ... --- ## Example Models Dataset: world record swimming times in 100-m freestyle <img src="slides02linear_files/figure-html/swim-times-1.png" width="75%" style="display: block; margin: auto;" /> --- ## Constant Model (Intercept Only) .pull-left[ Model: `time ~ 1` ```r (model <- lm(time ~ 1, data = swim_records)) %>% tidy() %>% kable() ``` |term | estimate| std.error| statistic| p.value| |:-----------|--------:|---------:|---------:|-------:| |(Intercept) | 59.92419| 1.259408| 47.58124| 0| How it predicts: <pre> time = 59.92 sec </pre> ] .pull-right[ <img src="slides02linear_files/figure-html/unnamed-chunk-5-1.png" width="90%" style="display: block; margin: auto;" /> ] --- ## Intercept and Year .pull-left[ Model: `time ~ 1 + year` ```r (model <- lm(time ~ 1 + year, data = swim_records)) %>% tidy() %>% kable() ``` |term | estimate| std.error| statistic| p.value| |:-----------|-----------:|---------:|---------:|-------:| |(Intercept) | 567.2420024| 53.86572| 10.53067| 0| |year | -0.2598771| 0.02759| -9.41925| 0| How it predicts: <pre> time = 567.24 sec<br> - (0.26 sec per year) * year </pre> ] .pull-right[ <img src="slides02linear_files/figure-html/unnamed-chunk-7-1.png" width="90%" style="display: block; margin: auto;" /> ] --- ## Intercept and Sex .pull-left[ Model: `time ~ 1 + sex` ```r (model <- lm(time ~ 1 + sex, data = swim_records)) %>% tidy() %>% kable() ``` |term | estimate| std.error| statistic| p.value| |:-----------|---------:|---------:|---------:|-------:| |(Intercept) | 65.19226| 1.516576| 42.986469| 0.0e+00| |sexM | -10.53613| 2.144763| -4.912492| 7.3e-06| How it predicts: <pre> time = 65.19 sec<br> - (10.54 sec) * (is Male) </pre> ] .pull-right[ <img src="slides02linear_files/figure-html/unnamed-chunk-10-1.png" width="90%" style="display: block; margin: auto;" /> ] --- ## Intercept, Sex, and Year .pull-left[ Model: `time ~ 1 + sex + year` ```r (model <- lm(time ~ 1 + sex + year, data = swim_records)) %>% tidy() %>% kable() ``` |term | estimate| std.error| statistic| p.value| |:-----------|-----------:|----------:|----------:|-------:| |(Intercept) | 555.7167834| 33.7999146| 16.441367| 0| |sexM | -9.7979615| 1.0128719| -9.673446| 0| |year | -0.2514637| 0.0173234| -14.515848| 0| How it predicts: <pre> time = 555.72 sec<br> - (9.8 sec) * (is Male)<br> - (0.25 sec/year) * year </pre> ] .pull-right[ <img src="slides02linear_files/figure-html/unnamed-chunk-13-1.png" width="90%" style="display: block; margin: auto;" /> ] --- ## Intercept, Sex, and Year, and Interaction .pull-left[ Model: `time ~ 1 + sex * year` ```r (model <- lm(time ~ 1 + sex * year, data = swim_records)) %>% tidy() %>% kable() ``` |term | estimate| std.error| statistic| p.value| |:-----------|------------:|----------:|----------:|-------:| |(Intercept) | 697.3012156| 39.2214304| 17.778577| 0.0e+00| |sexM | -302.4638388| 56.4116340| -5.361728| 1.5e-06| |year | -0.3240459| 0.0201042| -16.118280| 0.0e+00| |sexM:year | 0.1499166| 0.0288933| 5.188622| 2.8e-06| How it predicts: <pre> time = 697.3 sec<br> - (302.46 sec if Male)<br> - (0.32 sec/year) * year<br> + (0.15 sec/year if Male) * year </pre> ] .pull-right[ <img src="slides02linear_files/figure-html/unnamed-chunk-16-1.png" width="90%" style="display: block; margin: auto;" /> ]