library(tidyverse)

This document presents an analysis of recent presidential popularity as inspired by examples from FiveThirtyEight & Data Science in a Box. Here’s the target plot, which appeared on the website on October 4, 2021.

Biden Approval
Biden Approval

This web plot has a pull-down box for the poll type, and the plotted values are animated. We won’t be able to reproduce that directly in a static ggplot.

We will base our analysis on the assumption that the approval ratings in the FiveThirtyEight datasets are accurate and useful. For a discussion of how the data were collected and processed, see this article by N. Rakich: How We’re Tracking Joe Biden’s Approval Rating.

Loading the Dataset

The dataset was downloaded directly from the FiveThirtyEight website.

approval_raw_biden <- read_csv("data/approval_topline_biden.csv")
approval_raw_biden
## # A tibble: 1,161 × 10
##    president subgroup  modeldate appro…¹ appro…² appro…³ disap…⁴ disap…⁵ disap…⁶
##    <chr>     <chr>     <chr>       <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 Joe Biden All polls 2/13/2022    41.8    45.7    37.9    53.1    57.8    48.3
##  2 Joe Biden Adults    2/13/2022    41.5    45.5    37.6    52.9    57.4    48.4
##  3 Joe Biden Voters    2/13/2022    42.3    46.3    38.3    52.4    57.2    47.6
##  4 Joe Biden All polls 2/12/2022    41.4    45.3    37.6    52.5    57.2    47.8
##  5 Joe Biden Adults    2/12/2022    40.7    44.6    36.8    52.0    56.3    47.7
##  6 Joe Biden Voters    2/12/2022    42.3    46.3    38.3    52.4    57.2    47.6
##  7 Joe Biden All polls 2/11/2022    41.4    45.3    37.6    52.5    57.2    47.8
##  8 Joe Biden Adults    2/11/2022    40.7    44.6    36.8    52.0    56.3    47.7
##  9 Joe Biden Voters    2/11/2022    42.3    46.3    38.3    52.4    57.2    47.6
## 10 Joe Biden All polls 2/10/2022    41.3    45.2    37.3    52.6    57.3    47.8
## # … with 1,151 more rows, 1 more variable: timestamp <chr>, and abbreviated
## #   variable names ¹​approve_estimate, ²​approve_hi, ³​approve_lo,
## #   ⁴​disapprove_estimate, ⁵​disapprove_hi, ⁶​disapprove_lo

We’ll focus on the approval estimates over time, renaming some columns, processing the date character string, and ensuring a consistent spelling of Biden’s name.

approval_biden <- approval_raw_biden %>% 
  select(president,
         subgroup, 
         date = modeldate, 
         approval = approve_estimate, 
         disapproval = disapprove_estimate) %>% 
  mutate(date = lubridate::mdy(date),
         president = "Joe Biden",
         ) %>% 
  filter(subgroup != "All polls")
approval_biden
## # A tibble: 774 × 5
##    president subgroup date       approval disapproval
##    <chr>     <chr>    <date>        <dbl>       <dbl>
##  1 Joe Biden Adults   2022-02-13     41.5        52.9
##  2 Joe Biden Voters   2022-02-13     42.3        52.4
##  3 Joe Biden Adults   2022-02-12     40.7        52.0
##  4 Joe Biden Voters   2022-02-12     42.3        52.4
##  5 Joe Biden Adults   2022-02-11     40.7        52.0
##  6 Joe Biden Voters   2022-02-11     42.3        52.4
##  7 Joe Biden Adults   2022-02-10     40.7        52.0
##  8 Joe Biden Voters   2022-02-10     42.1        52.6
##  9 Joe Biden Adults   2022-02-09     40.9        51.8
## 10 Joe Biden Voters   2022-02-09     42.1        52.6
## # … with 764 more rows
approval_biden %>% 
  distinct(president)
## # A tibble: 1 × 1
##   president
##   <chr>    
## 1 Joe Biden

Restructuring the Dataset

We note that the ratings values are split between two columns, which doesn’t allow us to easily plot both approval and disapproval ratings in a single, 2D graph. To do this, we need all the rating values to be in a single column with an additional column indicating the rating type, approval or disapproval.

approval_longer_biden <- approval_biden %>%
  pivot_longer(
    cols = c(approval, disapproval),
    names_to = "rating_type",
    values_to = "rating_value"
  )
approval_longer_biden
## # A tibble: 1,548 × 5
##    president subgroup date       rating_type rating_value
##    <chr>     <chr>    <date>     <chr>              <dbl>
##  1 Joe Biden Adults   2022-02-13 approval            41.5
##  2 Joe Biden Adults   2022-02-13 disapproval         52.9
##  3 Joe Biden Voters   2022-02-13 approval            42.3
##  4 Joe Biden Voters   2022-02-13 disapproval         52.4
##  5 Joe Biden Adults   2022-02-12 approval            40.7
##  6 Joe Biden Adults   2022-02-12 disapproval         52.0
##  7 Joe Biden Voters   2022-02-12 approval            42.3
##  8 Joe Biden Voters   2022-02-12 disapproval         52.4
##  9 Joe Biden Adults   2022-02-11 approval            40.7
## 10 Joe Biden Adults   2022-02-11 disapproval         52.0
## # … with 1,538 more rows

One should ask which dataset is properly tidy, the original dataset or this restructured dataset. In some sense, the approval/disapproval values are all ratings, but adding them up or averaging them makes little sense, even if we represent disapprovals as negative ratings. Also, it may be easier to add additional rating types to the new structure because the original requires adding more columns, but, again, it’s not clear that this modification makes the dataset more tidy (n.b., the original dataset had additional rating types in separate columns). Consequently, we acknowledge that for the purposes of this example, we’ve pivoted the data to present it, not to tidy it.

Plotting the Dataset

We’re now ready to re-engineer an approximation of FiveThirtyEight’s original plot.

approval_longer_biden %>%
  ggplot() +
  aes(x = date, 
      y = rating_value, 
      color = rating_type,
      ) +
  geom_line() +
  facet_wrap(vars(subgroup)) +
  scale_color_manual(values = c("darkgreen", "orange")) +
  labs(
    x = "Date", y = "Rating",
    color = NULL,
    title = "How (un)popular is Joe Biden?",
    subtitle = "Estimates based on polls of all adults and polls of likely/registered voters",
    caption = "Source: FiveThirtyEight modeling estimates"
  ) +
  theme_minimal()

Comparing Presidential Approval Ratings

FiveThirtyEight also presents plots of approval data for some previous presidents, back through H. Truman (see: How Biden compares with past presidents). We’ve downloaded the available data, which only goes back through D. Trump, and processed it in a manner similar to what we did for J. Biden’s approval data.

approval_raw_trump <- read_csv("data/approval_topline_trump.csv")

approval_longer_trump <- approval_raw_trump %>% 
  select(president,
         subgroup, 
         date = modeldate, 
         approval = approve_estimate, 
         disapproval = disapprove_estimate) %>% 
  mutate(date = lubridate::mdy(date)) %>% 
  filter(subgroup != "All polls") %>%
  pivot_longer(
    cols = c(approval, disapproval),
    names_to = "rating_type",
    values_to = "rating_value"
  )

approval_longer_trump
## # A tibble: 5,836 × 5
##    president    subgroup date       rating_type rating_value
##    <chr>        <chr>    <date>     <chr>              <dbl>
##  1 Donald Trump Voters   2021-01-20 approval            39.4
##  2 Donald Trump Voters   2021-01-20 disapproval         56.7
##  3 Donald Trump Adults   2021-01-20 approval            37.0
##  4 Donald Trump Adults   2021-01-20 disapproval         59.6
##  5 Donald Trump Adults   2021-01-19 approval            38.1
##  6 Donald Trump Adults   2021-01-19 disapproval         59.1
##  7 Donald Trump Voters   2021-01-19 approval            40.2
##  8 Donald Trump Voters   2021-01-19 disapproval         55.8
##  9 Donald Trump Adults   2021-01-18 approval            36.1
## 10 Donald Trump Adults   2021-01-18 disapproval         60.6
## # … with 5,826 more rows

To compare the presidents, we combine these datasets into one using the union function, which produces the set union of the records in the two datasets. Here, it is important that the two datasets have exactly the same columns.

approval_trump_biden <- bind_rows(approval_longer_trump,
                              approval_longer_biden
                              )
approval_trump_biden
## # A tibble: 7,384 × 5
##    president    subgroup date       rating_type rating_value
##    <chr>        <chr>    <date>     <chr>              <dbl>
##  1 Donald Trump Voters   2021-01-20 approval            39.4
##  2 Donald Trump Voters   2021-01-20 disapproval         56.7
##  3 Donald Trump Adults   2021-01-20 approval            37.0
##  4 Donald Trump Adults   2021-01-20 disapproval         59.6
##  5 Donald Trump Adults   2021-01-19 approval            38.1
##  6 Donald Trump Adults   2021-01-19 disapproval         59.1
##  7 Donald Trump Voters   2021-01-19 approval            40.2
##  8 Donald Trump Voters   2021-01-19 disapproval         55.8
##  9 Donald Trump Adults   2021-01-18 approval            36.1
## 10 Donald Trump Adults   2021-01-18 disapproval         60.6
## # … with 7,374 more rows

Now, we can reproduce the approval chart for the past two presidents as a time series.

approval_trump_biden %>%
  ggplot() +
  aes(x = date, 
      y = rating_value, 
      color = rating_type,
      ) +
  geom_line() +
  facet_grid(vars(subgroup)) +
  scale_color_manual(values = c("darkgreen", "orange")) +
  labs(
    x = "Date", y = "Rating",
    color = NULL,
    title = "How (un)popular are Trump (2017-2021) & Biden (2021-present)?",
    subtitle = "Estimates based on polls of all adults and polls of likely/registered voters",
    caption = "Source: FiveThirtyEight modeling estimates"
  ) +
  theme_minimal()

We see here that the approval ratings of republican D. Trump and democrat J. Biden reversed when Biden took office in January 2021, but have seen another reversal in 2022.