library(tidyverse) 

This analysis explores the data used by P. Aldhous in this Buzzfeed article. The article claims that one key factor in the US’s leadership in science and technology is immigration because while most living Nobel laureates in the sciences are based in the US, many of them were born in other countries.

The Dataset

🚧 You’ll need to get the nobel dataset, install it in a data folder, and load it.

🚧 Because there is no formal webpage for this dataset, study the dataset and give a short summary here on what it contains. How many observations and how many variables are in the dataset? What does each observation represent? Use inline code to answer this question. Please include this data dictionary.

In a few cases the name of the city/country changed after laureate was given (e.g. in 1975 Bosnia and Herzegovina was part of the Socialist Federative Republic of Yugoslavia). In these cases the variables below reflect a different name than their counterparts without the suffix _original.

Cleansing the Data

🚧 Create a new data frame called nobel_living that includes only:

Confirm that once you have filtered for these characteristics you are left with a data frame with 228 observations.

Determining Where Nobel Laureates Lived

The Buzzfeed article claims that most living Nobel laureates were based in the US when they won their prizes. First, we’ll create a new variable to identify whether the laureate was in the US when they won their prize.

We include a mutate() function that uses a functional variant of the classic “if” statement, called if_else(), to create this variable. The arguments to this new function, to be covered in more detail later in the course, are:

  mutate(
    country_us = if_else(country == "USA", "USA", "Other")
  )

🚧 Add a code chunk that creates a data frame called nobel_living_science by combining the two transformations above into a pipeline: use the mutate() with the if_else discussed above to create the a country_us variable; and use filter() to limit the results to include only categories with values %in% “Physics”, “Medicine”, “Chemistry”, “Economics”.

🚧 Create a faceted bar plot, with horizontal bars, visualizing the relationship between the category of prize and whether the laureate was in the US when they won the Nobel prize. Interpret your visualization, and say a few words about whether the Buzzfeed headline is supported by the data.

prizes by current country

Determining Where Nobel Laureates Were Born

🚧 Go back to the code chunk that created nobel_living_science and add a new variable called born_country_us that has the value "USA" if the laureate is born in the US, and "Other" otherwise. Do this by modifying your earlier code chunk; you won’t add anything new here.

🚧 Remake your visualization and add a second variable: whether the laureate was born in the US or not. Your final visualization should contain a facet for each category, within each facet a bar for whether they won the award in the US or not, and within each bar whether they were born in the US or not. (Don’t over-think this: you can do this by just adding another aesthetic mapping!) Based on your visualization, do the data appear to support Buzzfeed’s claim? Explain your reasoning in 1-2 sentences.

prizes by country of origin

The data show that very few Nobel prize winners who won in other countries emmigrated there from the US. Conversely, however, the data show that many US prize winners were born in other countries, at least for fields other than Economics in which the majority of winners were not US-born.

Determining Where Immigrant Nobel Laureates Were Born

🚧 Make a table for where immigrant Nobelists were born, using a single pipeline:

country of origin of immigrant winners

Recreating the Buzzfeed Visualizations [OPTIONAL]

The plots in the Buzzfeed article are called waffle plots. You can find the code used for making these plots in Buzzfeed’s GitHub repo (yes, they have one!) here. You’re not expected to recreate them as part of your assignment, but you’re welcome to do so for fun!

