We’ll use an example from a 2017 PLOS Computational Biology paper
| Name | autism |
| Number of rows | 206 |
| Number of columns | 26 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 25 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Group | 0 | 1 | 3 | 3 | 0 | 3 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Methion. | 0 | 1.00 | 21.38 | 3.89 | 13.43 | 18.94 | 21.08 | 23.61 | 33.14 | ▃▇▆▂▁ |
| SAM | 0 | 1.00 | 65.65 | 12.33 | 37.22 | 57.83 | 63.92 | 71.13 | 121.16 | ▂▇▂▁▁ |
| SAH | 0 | 1.00 | 16.68 | 4.35 | 7.39 | 14.14 | 16.61 | 19.27 | 28.12 | ▃▆▇▃▁ |
| SAM/SAH | 0 | 1.00 | 4.26 | 1.61 | 1.66 | 3.24 | 3.90 | 4.79 | 13.10 | ▇▆▁▁▁ |
| % DNA methylation | 0 | 1.00 | 3.85 | 0.94 | 1.78 | 3.14 | 3.88 | 4.52 | 7.73 | ▃▇▆▁▁ |
| 8-OHG | 0 | 1.00 | 0.07 | 0.03 | 0.03 | 0.05 | 0.07 | 0.08 | 0.18 | ▅▇▂▁▁ |
| Adenosine | 0 | 1.00 | 0.13 | 0.06 | 0.04 | 0.08 | 0.12 | 0.17 | 0.39 | ▇▆▃▁▁ |
| Homocysteine | 0 | 1.00 | 4.87 | 1.14 | 3.06 | 4.11 | 4.67 | 5.57 | 9.70 | ▇▇▃▁▁ |
| Cysteine | 0 | 1.00 | 201.50 | 19.83 | 141.31 | 188.50 | 201.12 | 214.61 | 261.44 | ▁▅▇▅▁ |
| Glu.-Cys. | 0 | 1.00 | 2.10 | 0.55 | 1.12 | 1.67 | 1.95 | 2.50 | 3.83 | ▅▇▅▂▁ |
| Cys.-Gly. | 0 | 1.00 | 43.01 | 6.80 | 27.12 | 38.83 | 42.66 | 46.97 | 63.75 | ▂▇▇▃▁ |
| tGSH | 0 | 1.00 | 6.24 | 1.40 | 3.25 | 5.24 | 6.18 | 7.01 | 11.18 | ▃▇▇▁▁ |
| fGSH | 0 | 1.00 | 2.02 | 0.59 | 1.08 | 1.68 | 1.90 | 2.23 | 5.44 | ▇▅▁▁▁ |
| GSSG | 0 | 1.00 | 0.18 | 0.09 | 0.05 | 0.12 | 0.17 | 0.22 | 0.65 | ▇▆▁▁▁ |
| fGSH/GSSG | 0 | 1.00 | 13.77 | 8.03 | 2.15 | 8.24 | 11.55 | 17.78 | 57.00 | ▇▅▁▁▁ |
| tGSH/GSSG | 0 | 1.00 | 42.50 | 22.76 | 8.55 | 25.60 | 36.30 | 55.38 | 130.00 | ▇▆▃▁▁ |
| Chlorotyrosine | 0 | 1.00 | 44.06 | 25.07 | 6.39 | 26.53 | 38.39 | 58.10 | 143.28 | ▇▇▃▁▁ |
| Nitrotyrosine | 0 | 1.00 | 88.07 | 59.23 | 21.56 | 47.69 | 71.44 | 107.00 | 467.30 | ▇▂▁▁▁ |
| Tyrosine | 0 | 1.00 | 50.87 | 13.78 | 22.93 | 40.71 | 48.97 | 60.15 | 92.93 | ▂▇▅▂▁ |
| Tryptophane | 0 | 1.00 | 38.16 | 8.88 | 20.05 | 30.92 | 37.61 | 44.60 | 60.12 | ▃▇▇▆▁ |
| fCystine | 0 | 1.00 | 26.21 | 8.38 | 10.32 | 19.80 | 25.41 | 30.80 | 54.11 | ▅▇▆▂▁ |
| fCysteine | 0 | 1.00 | 21.73 | 4.72 | 10.85 | 18.39 | 21.04 | 24.23 | 36.54 | ▁▇▆▂▁ |
| fCystine/fCysteine | 0 | 1.00 | 1.25 | 0.50 | 0.57 | 0.91 | 1.12 | 1.44 | 4.06 | ▇▃▁▁▁ |
| % oxidized | 0 | 1.00 | 0.16 | 0.07 | 0.03 | 0.10 | 0.15 | 0.20 | 0.48 | ▇▇▂▁▁ |
| Vineland ABC | 159 | 0.23 | 70.77 | 13.91 | 46.00 | 61.50 | 69.00 | 76.00 | 106.00 | ▅▇▅▃▂ |
We have 3 kinds of data about 206 children:
Group): ASD (diagnosed with ASD), SIB (sibling not diagnosed with ASD), and NEU (age-matched neurotypical children, for control)Group): ASD, SIB, NEU [1] "Methion." "SAM" "SAH"
[4] "SAM/SAH" "% DNA methylation" "8-OHG"
[7] "Adenosine" "Homocysteine" "Cysteine"
[10] "Glu.-Cys." "Cys.-Gly." "tGSH"
[13] "fGSH" "GSSG" "fGSH/GSSG"
[16] "tGSH/GSSG" "Chlorotyrosine" "Nitrotyrosine"
[19] "Tyrosine" "Tryptophane" "fCystine"
[22] "fCysteine" "fCystine/fCysteine" "% oxidized"
Group): ASD, SIB, NEUWhat do these metabolites look like?
code for the previous plot:
Better question: Can these metabolites help us distinguish autism?
code for previous plot:
Group as a factor. (Ensure that “ASD” is the first level.) [1] ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD
[19] ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD
[37] ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD
[55] ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD
[73] ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD ASD NEU NEU NEU NEU NEU NEU NEU
[91] NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU
[109] NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU
[127] NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU
[145] NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU NEU
Levels: ASD NEU
| term | estimate |
|---|---|
| (Intercept) | -15.3438894 |
tGSH/GSSG |
0.1364630 |
| fGSH | 2.1681857 |
| Methion. | 0.2919989 |
\[ \log\left[ \frac { \widehat{P( \operatorname{..y} = \operatorname{NEU} )} }{ 1 - \widehat{P( \operatorname{..y} = \operatorname{NEU} )} } \right] = -15.34 + 0.14(\operatorname{tGSH/GSSG}) + 2.17(\operatorname{fGSH}) + 0.29(\operatorname{Methion.}) \]
Other examples:
decision_metrics <- yardstick::metric_set(accuracy, sensitivity,
specificity, yardstick::f_meas)
model %>%
predict(data, type = "class") %>%
bind_cols(data) %>%
decision_metrics(truth = Group, estimate = .pred_class)# A tibble: 4 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 accuracy binary 0.906
2 sensitivity binary 0.940
3 specificity binary 0.868
4 f_meas binary 0.912
Note: these are all measures of the quality of the decision. Other metrics can evaluate the scores:
Note: this is transposed of the Wikipedia table
| Event happened | No event happened | |
|---|---|---|
| Event predicted | True positive | False positive (Type 1 error) |
| No event predicted | False negative (Type 2 error) | True negative |
metrics <- yardstick::metric_set(mn_log_loss, roc_auc)
model %>% predict(data, type = "prob") %>%
bind_cols(data) %>%
metrics(truth = Group, .pred_ASD)# A tibble: 2 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 mn_log_loss binary 0.262
2 roc_auc binary 0.953