This R-Markdown file reproduces the code demoed in the class demo slideshow.
First, we load the libraries with the required programming tools and the raw data.
library(tidyverse)
library(gapminder)
Here’s the data (from the gapminder library).
gapminder
## # A tibble: 1,704 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # … with 1,694 more rows
This computes the average life expectancy in the data over all countries and all times.
gapminder %>%
summarize(AvgLifeExp = mean(lifeExp))
## # A tibble: 1 × 1
## AvgLifeExp
## <dbl>
## 1 59.5
This groups the data by year and computes an average per group.
gapminder %>%
group_by(year) %>%
summarize(AvgLifeExp = mean(lifeExp))
## # A tibble: 12 × 2
## year AvgLifeExp
## <int> <dbl>
## 1 1952 49.1
## 2 1957 51.5
## 3 1962 53.6
## 4 1967 55.7
## 5 1972 57.6
## 6 1977 59.6
## 7 1982 61.5
## 8 1987 63.2
## 9 1992 64.2
## 10 1997 65.0
## 11 2002 65.7
## 12 2007 67.0
This takes the same data, grouped by year, and plots the average over time using a line plot.
gapminder %>%
group_by(year) %>%
summarize(AvgLifeExp = mean(lifeExp)) %>%
ggplot() + #<<
aes(x=year, y=AvgLifeExp) + #<<
geom_line() #<<
Finally, this graph approximates one time-stop of Rosling’s famous animated plot (see: The Best Stats You’ve Ever Seen.
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
gapminder %>%
filter(year == 2007) %>%
ggplot() +
aes(x = gdpPercap, y = lifeExp) +
geom_point(alpha = .8) +
aes(color = continent) +
aes(size = pop) +
scale_x_continuous(
breaks = c(400, 4000, 40000),
trans = "log10") +
labs(x = "GDP per Capita") +
labs(y = "Life Expectancy (years)") +
labs(color = "Continent") +
labs(size = "Population") +
scale_size_area(labels=label_comma()) +
theme_bw() +
annotation_logticks(sides = "b")
Question: Could you do this in Excel?