For this final project, you will do the following.

  1. Data Collection
  2. Data Wrangling
  3. Data Exploration
  4. Data Conclusions

Note that the details on the process or the report discussed below are negotiable and we’re happy to help at any time. Contact us if you questions or are having issues.

The project will be done individually. The scope of the project should be something just beyond what you’ve done for the homework assignments.

The Process

Here are details on each of these steps.

1. Data Collection

Obtain some real-world data that is of interest to you. To get started, you could focus on:

  • an organization that you’re interested in (e.g., a company, NGO, government agency, sports team, hospital, school) and find some data that is helpful for them as they make decisions.
  • an issue that’s important to you (e.g., political, social, or scientific issues) and find some news articles or opinion pieces that address that issue using data.

Your dataset should include more than one table and, if possible, include some data obtained from an original source rather than from a pre-built, pre-tidied dataset. The source should be a reputable source. Trace where the data came from and how it’s structured. If you use an aggregator site like OurWorldInData.org, try to track down where they got the data. Other data sources include: the Web (for all things textual); data.gov (for all things US demographic); data.un.org (for all things world demographic).

You can use a dataset that has already been collected, cleansed, wrangled, and analyzed, but you’ll need to augment that dataset with some raw data, and to extend the analysis in some interesting ways.

2. Data Wrangling

Wrangle that data into a useful form.

Again, you can use some pre-wrangled data, but you will need to do some data cleansing and wrangling.

3. Data Exploration

Explore the data iteratively, searching for the “story” that it tells. Try to look beyond simple facts (e.g., “number of wins by each athlete”) to more interesting information about relationships (e.g,. “the highest paid athletes don’t necessarily win more”).

4. Data Conclusions

Write a report that tells a compelling story based on that data. Your report should interleave text, tables, and visualizations as appropriate.

You can include visualizations created by others, but you should re-engineer those visualizations and extend them in interesting ways. For these existing visualizations, evaluate the choices and assumptions that were made in creating them and assess whether they faithfully represent the data (or not).

The Report

Your final report should be:

Please follow this general outline.

  1. Analysis: Include data tables and visualizations. At least one of your visuals should be of high quality, with effort spent getting the presentation details right.