This R-Markdown file reproduces the code demoed in the class demo slideshow.

First, we load the libraries with the required programming tools and the raw data.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(gapminder)

Here’s the data (from the gapminder library).

gapminder
## # A tibble: 1,704 × 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # … with 1,694 more rows

This computes the average life expectancy in the data over all countries and all times.

gapminder %>% 
  summarize(AvgLifeExp = mean(lifeExp))
## # A tibble: 1 × 1
##   AvgLifeExp
##        <dbl>
## 1       59.5

This groups the data by year and computes an average per group.

gapminder %>% 
  group_by(year) %>%
  summarize(AvgLifeExp = mean(lifeExp))
## # A tibble: 12 × 2
##     year AvgLifeExp
##    <int>      <dbl>
##  1  1952       49.1
##  2  1957       51.5
##  3  1962       53.6
##  4  1967       55.7
##  5  1972       57.6
##  6  1977       59.6
##  7  1982       61.5
##  8  1987       63.2
##  9  1992       64.2
## 10  1997       65.0
## 11  2002       65.7
## 12  2007       67.0

This takes the same data, grouped by year, and plots the average over time using a line plot.

gapminder %>% 
  group_by(year) %>%                      
  summarize(AvgLifeExp = mean(lifeExp)) %>%
  ggplot() + #<<
  aes(x=year, y=AvgLifeExp) +  #<<
  geom_line() #<<

Finally, this graph approximates one time-stop of Rosling’s famous animated plot (see: The Best Stats You’ve Ever Seen.

library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
gapminder %>% 
  filter(year == 2007) %>%
  ggplot() +
  aes(x = gdpPercap, y = lifeExp) +
  geom_point(alpha = .8) +
  aes(color = continent) +
  aes(size = pop) + 
  scale_x_continuous( 
    breaks = c(400, 4000, 40000),
    trans = "log10") +
  labs(x = "GDP per Capita") +
  labs(y = "Life Expectancy (years)") + 
  labs(color = "Continent") +
  labs(size = "Population") +
  scale_size_area(labels=label_comma()) + 
  theme_bw() +
  annotation_logticks(sides = "b")

Question: Could you do this in Excel?