library(tidyverse)

This document reproduces an analysis of income distributed by religious group made by the Pew Religious Landscape Study based on data collected by the Pew Research Center and downloaded on April 14, 2020.

Loading the Dataset

The dataset is stored as an Excel file.

library(readxl)

income_religion <- read_excel("data/income-religion.xlsx")
income_religion
## # A tibble: 12 × 6
##    `Religious tradition`                Less t…¹ $30,0…² $50,0…³ $100,…⁴ Sampl…⁵
##    <chr>                                   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 "Buddhist"                               0.36    0.18    0.32    0.13     233
##  2 "Catholic"                               0.36    0.19    0.26    0.19    6137
##  3 "Evangelical Protestant"                 0.35    0.22    0.28    0.14    7462
##  4 "Hindu"                                  0.17    0.13    0.34    0.36     172
##  5 "Historically Black Protestant"          0.53    0.22    0.17    0.08    1704
##  6 "Jehovah's Witness"                      0.48    0.25    0.22    0.04     208
##  7 "Jewish"                                 0.16    0.15    0.24    0.44     708
##  8 "Mainline Protestant"                    0.29    0.2     0.28    0.23    5208
##  9 "Mormon"                                 0.27    0.2     0.33    0.2      594
## 10 "Muslim"                                 0.34    0.17    0.29    0.2      205
## 11 "Orthodox Christian"                     0.18    0.17    0.36    0.29     155
## 12 "Unaffiliated (religious \"nones\")"     0.33    0.2     0.26    0.21    6790
## # … with abbreviated variable names ¹​`Less than $30,000`, ²​`$30,000-$49,999`,
## #   ³​`$50,000-$99,999`, ⁴​`$100,000 or more`, ⁵​`Sample Size`

Wrangling & Tidying the Dataset

We now massage the dataset to: - Rename the columns to make them easier to work with. - Pivot the percentages into a single column for presentation. - Factorize the religious tradition name to set its order. - Add a frequency count

income_religion_long <- income_religion %>%
  rename(
    religion = `Religious tradition`,
    n = `Sample Size`
  ) %>%
  pivot_longer(
    cols = -c(religion, n), 
    names_to = "income", 
    values_to = "proportion"
  ) %>%
  mutate(
    religion = fct_rev(religion), 
    income = fct_relevel(income, 
                         "$100,000 or more",
                         "$50,000−$99,999", 
                         "$30,000−$49,999",
                         "Less than $30,000"),
    frequency = round(proportion * n)
  )
## Warning: 2 unknown levels in `f`: $50,000−$99,999 and $30,000−$49,999
income_religion_long
## # A tibble: 48 × 5
##    religion                   n income            proportion frequency
##    <fct>                  <dbl> <fct>                  <dbl>     <dbl>
##  1 Buddhist                 233 Less than $30,000       0.36        84
##  2 Buddhist                 233 $30,000-$49,999         0.18        42
##  3 Buddhist                 233 $50,000-$99,999         0.32        75
##  4 Buddhist                 233 $100,000 or more        0.13        30
##  5 Catholic                6137 Less than $30,000       0.36      2209
##  6 Catholic                6137 $30,000-$49,999         0.19      1166
##  7 Catholic                6137 $50,000-$99,999         0.26      1596
##  8 Catholic                6137 $100,000 or more        0.19      1166
##  9 Evangelical Protestant  7462 Less than $30,000       0.35      2612
## 10 Evangelical Protestant  7462 $30,000-$49,999         0.22      1642
## # … with 38 more rows

Reproducing the Plot

With the dataset in place, we can now build an approximation of Pew’s plot.

income_religion_long %>%
  ggplot() +
  aes(x = frequency,
      y = religion, 
      fill = income,
      ) +
  geom_col(
    position = "fill"
  ) +
  scale_fill_viridis_d() +
  theme_minimal() +
  theme(
   legend.position = "bottom",
   legend.key.size = unit(0.2, "cm"),
    ) +
  guides(
    fill = guide_legend(reverse = TRUE)
    ) +
  labs(
    x = "Frequency", y = "",
    title = "Income distribution by religious group",
    subtitle = "% of adults who have a household income of...",
    caption = "Source: Pew Research Center, Religious Landscape Study",
    fill = NULL,
    )