class: center, middle, inverse, title-slide # What makes a good prediction? ### K Arnold --- ## Objectives * Compare and contrast regression tasks and classification tasks, and give examples of each * Identify two different ways of measuring accuracy for regression and for classification * Identify several reasons why a model may predict better on some subsets of data than others --- ## Types of Tasks * **regression**: predict a *number* ("continuous") * number should be "close" in some sense to the correct number * **classification**: predict a *category* * which one of these two groups? three groups? 500,000 groups? * could ask: "how likely is it to be in group *i*" --- ## Are these tasks *regression* or *classification*? 1. Is this a picture of the inside or outside of the restaurant? 1. How much will it rain in GR next year? 1. Is this person having a seizure? 1. How much will this home sell for? 1. How much time will this person spend watching this video? 1. How big a fruit will this plant produce? 1. Which word did this person mean to type? 1. Will this person "Like" this post? --- ## Today's examples **Regression**: housing prices in Ames, Iowa. Details: * [Paper](http://jse.amstat.org/v19n3/decock.pdf) * [Data Dictionary](http://jse.amstat.org/v19n3/decock/DataDocumentation.txt) **Classification**: *seizure classification*. First FDA-approved AI-powered medical device: Empatica [Embrace2](https://www.empatica.com/embrace2/), company founded by MIT data scientist Rosalind Picard <img src="https://www.empatica.com/assets/images/embrace/features_em2_mb_a-lg-xhdpi.png" width="20%" style="display: block; margin: auto;" /> --- ## What makes a good prediction? *Regression* We predicted the home would sell for $250k. It sold for $200k. Is that good? -- * **residual**: actual minus predicted * If home sold for $200k but we predicted $250k, residual is _______ * **absolute error** * **squared error** -- Across the entire dataset: * **average error**: do we tend to predict too high? too low? "*bias*" * **max** absolute error * **mean** absolute error * **mean squared error** (MSE) * normalized squared error: MSE / Variance * The confusingly named "R2" = 1 - normalized squared error --- ## What makes a good prediction? *Classification* Suppose: every minute, the armband decides whether a seizure is occurring <br> The child was perfectly fine but our armband flagged a seizure. Is that good? -- <br> The child was having a seizure but our armband didn't flag it. Is that good? --- ## What makes a good prediction? *Classification* | | Seizure happened | No seizure happened | |----------------------|-------------------------------|-------------------------------| | Seizure predicted | True positive | False positive (Type 1 error) | | No seizure predicted | False negative (Type 2 error) | True negative | -- - **Accuracy** (% correct) = (TP + TN) / (# episodes) - **False negative** ("miss") **rate** = FN / (# actual seizures) - **False positive** ("false alarm") **rate** = FP / (# true non-seizures) -- - **Sensitivity** ("true positive rate") = TP / (# actual seizures) - Sensitivity = 1 − False negative rate - **Specificity** ("true negative rate") = TN / (# actual seizures) - Specificity = 1 − False positive rate - [Wikipedia article](https://en.wikipedia.org/wiki/Sensitivity_and_specificity) --- .question[ If you were designing a seizure alert system, would you want sensitivity and specificity to be high or low? What are the trade-offs associated with each decision? ] --- class: middle, center ## Validation .large[**Key point**: you *must* evaluate predictions on *unseen* data] --- Hey look! I can exactly predict how much a home will sell for! .small[ <table> <thead> <tr> <th style="text-align:right;"> Lot_Area </th> <th style="text-align:right;"> Sale_Price </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 31770 </td> <td style="text-align:right;"> 215000 </td> </tr> <tr> <td style="text-align:right;"> 11622 </td> <td style="text-align:right;"> 105000 </td> </tr> </tbody> </table> ] sale price = 41548.54 + 5.459599 * lot area <img src="w6d2-accuracy_files/figure-html/perfect-prediction-1-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Validation: *unseen* data .pull-left[ .small[ <table> <thead> <tr> <th style="text-align:right;"> Lot_Area </th> <th style="text-align:right;"> Sale_Price </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 31770 </td> <td style="text-align:right;"> 215000 </td> </tr> <tr> <td style="text-align:right;"> 11622 </td> <td style="text-align:right;"> 105000 </td> </tr> <tr> <td style="text-align:right;"> 14267 </td> <td style="text-align:right;"> 172000 </td> </tr> </tbody> </table> ]] .pull-right[ <img src="w6d2-accuracy_files/figure-html/perfectly-wrong-1.png" width="100%" style="display: block; margin: auto;" /> ] -- <table> <thead> <tr> <th style="text-align:right;"> Lot_Area </th> <th style="text-align:right;"> Sale_Price </th> <th style="text-align:right;"> predicted </th> <th style="text-align:right;"> residual </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 31770 </td> <td style="text-align:right;"> 215000 </td> <td style="text-align:right;"> 215000.0 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:right;"> 11622 </td> <td style="text-align:right;"> 105000 </td> <td style="text-align:right;"> 105000.0 </td> <td style="text-align:right;"> 0.00 </td> </tr> <tr> <td style="text-align:right;"> 14267 </td> <td style="text-align:right;"> 172000 </td> <td style="text-align:right;"> 119440.6 </td> <td style="text-align:right;"> 52559.36 </td> </tr> </tbody> </table> --- ## Oh ok, I'll just fix that one... .small[ <table> <thead> <tr> <th style="text-align:right;"> Lot_Area </th> <th style="text-align:right;"> Bsmt_Unf_SF </th> <th style="text-align:right;"> Sale_Price </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 31770 </td> <td style="text-align:right;"> 441 </td> <td style="text-align:right;"> 215000 </td> </tr> <tr> <td style="text-align:right;"> 11622 </td> <td style="text-align:right;"> 270 </td> <td style="text-align:right;"> 105000 </td> </tr> <tr> <td style="text-align:right;"> 14267 </td> <td style="text-align:right;"> 406 </td> <td style="text-align:right;"> 172000 </td> </tr> </tbody> </table> ] sale price = -37769.46 + 1.5311432 \* lot area + **462.8685748 \* basement sq ft** ### and look, it works! <table> <thead> <tr> <th style="text-align:right;"> Lot_Area </th> <th style="text-align:right;"> Bsmt_Unf_SF </th> <th style="text-align:right;"> Sale_Price </th> <th style="text-align:right;"> predicted </th> <th style="text-align:right;"> residual </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 31770 </td> <td style="text-align:right;"> 441 </td> <td style="text-align:right;"> 215000 </td> <td style="text-align:right;"> 215000 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:right;"> 11622 </td> <td style="text-align:right;"> 270 </td> <td style="text-align:right;"> 105000 </td> <td style="text-align:right;"> 105000 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:right;"> 14267 </td> <td style="text-align:right;"> 406 </td> <td style="text-align:right;"> 172000 </td> <td style="text-align:right;"> 172000 </td> <td style="text-align:right;"> 0 </td> </tr> </tbody> </table> *Do you really think so?* --- ## Failure to generalize Predictive models almost always do better on the data they're trained on than anything else. Why? * model uses a pattern that only held by chance * model uses a pattern that only holds for some data * model uses a pattern that's real but got a fuzzy picture of it General name: **Overfitting**