For this final project, you will do the following.
Note that the details on the process or the report discussed below are negotiable and we’re happy to help at any time. Contact us if you questions or are having issues.
The project will be done individually. The scope of the project should be something just beyond what you’ve done for the homework assignments.
Here are details on each of these steps.
Obtain some real-world data that is of interest to you. To get started, you could focus on:
Your dataset should include more than one table and, if possible, include some data obtained from an original source rather than from a pre-built, pre-tidied dataset. The source should be a reputable source. Trace where the data came from and how it’s structured. If you use an aggregator site like OurWorldInData.org, try to track down where they got the data. Other data sources include: the Web (for all things textual); data.gov (for all things US demographic); data.un.org (for all things world demographic).
You can use a dataset that has already been collected, cleansed, wrangled, and analyzed, but you’ll need to augment that dataset with some raw data, and to extend the analysis in some interesting ways.
Wrangle that data into a useful form.
Again, you can use some pre-wrangled data, but you will need to do some data cleansing and wrangling.
Explore the data iteratively, searching for the “story” that it tells. Try to look beyond simple facts (e.g., “number of wins by each athlete”) to more interesting information about relationships (e.g,. “the highest paid athletes don’t necessarily win more”).
Write a report that tells a compelling story based on that data. Your report should interleave text, tables, and visualizations as appropriate.
You can include visualizations created by others, but you should re-engineer those visualizations and extend them in interesting ways. For these existing visualizations, evaluate the choices and assumptions that were made in creating them and assess whether they faithfully represent the data (or not).
Your final report should be:
Please follow this general outline.
Introduction: Introduce your dataset and the conclusions you’ve drawn from it. If your work is based on existing articles, cite them.
Data: Load your data and give the details on where you got it, what it contains, where it originally came from, and how it’d structured. Be sure to discuss the terms under which you’re obtaining and using the data.
Wrangling: Work the data into a useable form, explaining your work in detail.