Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Meet the Toolkit

K Arnold, based on IntroDS.org

1 / 31

Q&A

Will we use databases / SQL? Yes, in the later part of the class.

Will everything be on Moodle? Moodle Calendar will have all dates, and direct links to anything outside of Moodle that you'll need.

2 / 31

Do you remember your take-away point from Monday?

3 / 31

Logistics

  • "Check off" all Moodle activities under "Introduction"
    • Prep 1
    • Lec 1.1
    • Discussion 1
    • Quiz 1
    • Lab 1.2
    • Lec 1.3
  • Start Prep 2 for Wednesday (no class Monday)
  • Homework 1 posted soon
  • Piazza: keep it up!
4 / 31

So far...

  • Monday: Overall objectives: projects, topics, dispositions
  • Wednesday:
    • Hands on practice with R, RStudio, Git, GitHub
    • First look at summarizing data in R
  • Today:
    • Review Wednesday's activity
    • Overview of the toolkit we're using

Questions so far?

5 / 31

Reproducible data analysis

6 / 31

Reproducibility checklist

What does it mean for a data analysis to be "reproducible"?

7 / 31

Reproducibility checklist

What does it mean for a data analysis to be "reproducible"?

Near-term goals:

  • Can you re-make all tables and figures easily?
  • Does the code actually do what you think it does?
  • In addition to what was done, is it clear why it was done?

Long-term goals:

  • Can the code be used for other data?
  • Can you extend the code to do other things?
7 / 31

Toolkit

  • Scriptability R
  • Literate programming (code, narrative, output in one place) R Markdown
  • Version control Git / GitHub
8 / 31

Tour: R and RStudio

9 / 31

A short list (for now) of R essentials

  • Functions are (most often) verbs, followed by what they will be applied to in parentheses:
do_this(to_this)
do_that(to_this, to_that, with_those)
10 / 31

A short list (for now) of R essentials

  • Functions are (most often) verbs, followed by what they will be applied to in parentheses:
do_this(to_this)
do_that(to_this, to_that, with_those)
  • Packages are loaded with the library function:
library(package_name)
10 / 31

R essentials (continued)

  • Columns (variables) in data frames are accessed with $:
dataframe$var_name
11 / 31

R essentials (continued)

  • Columns (variables) in data frames are accessed with $:
dataframe$var_name
  • Object documentation can be accessed with ?
?mean
11 / 31

tidyverse

  • The tidyverse is an opinionated collection of R packages designed for data science
  • All packages share an underlying philosophy and a common grammar
12 / 31

rmarkdown

  • write code and prose in reproducible computational documents

13 / 31

R Markdown

14 / 31

R Markdown

  • Fully reproducible reports -- each time you knit the analysis is ran from the beginning
  • Simple markdown syntax for text
  • Code goes in chunks, defined by three backticks, narrative goes outside of chunks
15 / 31

Tour: R Markdown

16 / 31

Environments

The environment of your R Markdown document is separate from the Console!

Remember this, and expect it to bite you a few times as you're learning to work with R Markdown!

17 / 31

Environments

First, run the following in the console

x <- 2
x * 3

All looks good, eh?

18 / 31

Environments

First, run the following in the console

x <- 2
x * 3

All looks good, eh?

Then, add the following in an R chunk in your R Markdown document

x * 3

What happens? Why the error?

18 / 31

R Markdown help

R Markdown Cheat Sheet
Help -> Cheatsheets

Markdown Quick Reference
Help -> Markdown Quick Reference

19 / 31

How will we use R Markdown?

  • Every assignment / report / project / etc. is an R Markdown document
  • You'll always have a template R Markdown document to start with
  • The amount of scaffolding in the template will decrease over the semester
20 / 31

Getting help in R

21 / 31

Version Control

23 / 31

Git and GitHub

  • Git is a version control system -- like “Track Changes” features from Microsoft Word, on steroids
  • It's not the only version control system, but it's a very popular one

  • GitHub is the home for your Git-based projects on the internet

  • We will use GitHub as a platform for web hosting and collaboration

24 / 31

Versioning

25 / 31

Versioning

with human readable messages

26 / 31

Why do we need version control?

27 / 31

How will we use Git and GitHub?

28 / 31

How will we use Git and GitHub?

29 / 31

How will we use Git and GitHub?

30 / 31

How will we use Git and GitHub?

31 / 31

Q&A

Will we use databases / SQL? Yes, in the later part of the class.

Will everything be on Moodle? Moodle Calendar will have all dates, and direct links to anything outside of Moodle that you'll need.

2 / 31
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow