library(tidyverse)
This document reproduces an analysis of income distributed by religious group made by the Pew Religious Landscape Study based on data collected by the Pew Research Center and downloaded on April 14, 2020.
The dataset is stored as an Excel file.
library(readxl)
income_religion <- read_excel("data/income-religion.xlsx")
income_religion
## # A tibble: 12 × 6
## `Religious tradition` Less t…¹ $30,0…² $50,0…³ $100,…⁴ Sampl…⁵
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "Buddhist" 0.36 0.18 0.32 0.13 233
## 2 "Catholic" 0.36 0.19 0.26 0.19 6137
## 3 "Evangelical Protestant" 0.35 0.22 0.28 0.14 7462
## 4 "Hindu" 0.17 0.13 0.34 0.36 172
## 5 "Historically Black Protestant" 0.53 0.22 0.17 0.08 1704
## 6 "Jehovah's Witness" 0.48 0.25 0.22 0.04 208
## 7 "Jewish" 0.16 0.15 0.24 0.44 708
## 8 "Mainline Protestant" 0.29 0.2 0.28 0.23 5208
## 9 "Mormon" 0.27 0.2 0.33 0.2 594
## 10 "Muslim" 0.34 0.17 0.29 0.2 205
## 11 "Orthodox Christian" 0.18 0.17 0.36 0.29 155
## 12 "Unaffiliated (religious \"nones\")" 0.33 0.2 0.26 0.21 6790
## # … with abbreviated variable names ¹`Less than $30,000`, ²`$30,000-$49,999`,
## # ³`$50,000-$99,999`, ⁴`$100,000 or more`, ⁵`Sample Size`
We now massage the dataset to: - Rename the columns to make them easier to work with. - Pivot the percentages into a single column for presentation. - Factorize the religious tradition name to set its order. - Add a frequency count
income_religion_long <- income_religion %>%
rename(
religion = `Religious tradition`,
n = `Sample Size`
) %>%
pivot_longer(
cols = -c(religion, n),
names_to = "income",
values_to = "proportion"
) %>%
mutate(
religion = fct_rev(religion),
income = fct_relevel(income,
"$100,000 or more",
"$50,000−$99,999",
"$30,000−$49,999",
"Less than $30,000"),
frequency = round(proportion * n)
)
## Warning: 2 unknown levels in `f`: $50,000−$99,999 and $30,000−$49,999
income_religion_long
## # A tibble: 48 × 5
## religion n income proportion frequency
## <fct> <dbl> <fct> <dbl> <dbl>
## 1 Buddhist 233 Less than $30,000 0.36 84
## 2 Buddhist 233 $30,000-$49,999 0.18 42
## 3 Buddhist 233 $50,000-$99,999 0.32 75
## 4 Buddhist 233 $100,000 or more 0.13 30
## 5 Catholic 6137 Less than $30,000 0.36 2209
## 6 Catholic 6137 $30,000-$49,999 0.19 1166
## 7 Catholic 6137 $50,000-$99,999 0.26 1596
## 8 Catholic 6137 $100,000 or more 0.19 1166
## 9 Evangelical Protestant 7462 Less than $30,000 0.35 2612
## 10 Evangelical Protestant 7462 $30,000-$49,999 0.22 1642
## # … with 38 more rows
With the dataset in place, we can now build an approximation of Pew’s plot.
income_religion_long %>%
ggplot() +
aes(x = frequency,
y = religion,
fill = income,
) +
geom_col(
position = "fill"
) +
scale_fill_viridis_d() +
theme_minimal() +
theme(
legend.position = "bottom",
legend.key.size = unit(0.2, "cm"),
) +
guides(
fill = guide_legend(reverse = TRUE)
) +
labs(
x = "Frequency", y = "",
title = "Income distribution by religious group",
subtitle = "% of adults who have a household income of...",
caption = "Source: Pew Research Center, Religious Landscape Study",
fill = NULL,
)