Introduction 2 Reproducibility

class: left, top, title-slide

.title[
# Introduction 2<br>Reproducibility
]
.author[
### Keith VanderLinden<br>Calvin University
]

---

## Reproducible Analyses

Science is based on experiments and analyses that are *reproducible*. 
.pull-left[
In data science, achieving reproducibility requires that we be clear about:

- the nature and source of our data.
- the process we used to analyze that data.
- the results of the analysis.
- the justifications for our conclusions.
]
.pull-right[
Our documented analyses must be clear enough that others can:

- access or reproduce the original data.
- understand/rerun the data processing code.
- rebuild the tables and the visualizations.
- assess the reasoning behind the conclusions.
]

Ultimately, we&rsquo;d like to extend the work to other related datasets and analyses.

???
Reproducing an analysis is the best way to demonstrate the validity of conclusions. The most useful conclusions can be applied in other, similar contexts.

In data science, spreadsheets can make analyses clear, but they don't work well for reproducing the analyses on different (or updated) datasets.

References
- MDSR D

---
## Building Reproducible Analyses

To achieve the goal of reproducibility, data scientists commonly use the following toolkit:

- R programming
- R Markdown 
- Git &amp; GitHub

RStudio integrates support for all three.

We focus on the first two in this course.

.footnote[Cf. literate programming, https://en.wikipedia.org/wiki/Literate_programming]

???
Tools:
- The R programming language supports scripts.
- RMarkdown supports *literate* programming.
- GitHub supports multiple versions and sharing of code.

References:
- literate programming: