class: left, top, title-slide .title[ # Introduction 2
Basic Programming in R ] .author[ ### Keith VanderLinden
Calvin University ] --- # Basic Programming in R We will be using the [R programming language](https://www.r-project.org/) throughout this course. R is a general-purpose programming language that supports: - Basic data types and data structures - Basic control structures, including functions - Packages - File Processing *R* supports *scripting* for data science. .footnote[See: https://www.r-project.org/] ??? R programming is most common in statistics; Python tends to be more common in computer science. References: - https://rbasics.netlify.app/index.html - Objects (https://cs.calvin.edu/courses/data/202/fa20/slides/w02/w2d1-toolkit.html#12) - Operators & Expressions - Data frames (https://cs.calvin.edu/courses/data/202/fa20/slides/w02/w2d1-toolkit.html#12 & https://ids-s1-20.github.io/slides/week-01/w1-d05-toolkit-r/w1-d05-toolkit-r.html#19) - Define the dataset terms: *observation*; *variable* - (Calling) Functions (https://ids-s1-20.github.io/slides/week-01/w1-d05-toolkit-r/w1-d05-toolkit-r.html#18) - Packages (https://ids-s1-20.github.io/slides/week-01/w1-d05-toolkit-r/w1-d05-toolkit-r.html#15) - Getting help (https://cs.calvin.edu/courses/data/202/fa20/slides/w02/w2d1-toolkit.html#22) --- # Functions and Packages .pull-left[ *Functions* provide pre-packaged processing and are called by name, with arguments, e.g., ```r wday(birthdate, label = TRUE) ``` Sets of useful functions are collected in *packages*. We will be using packages from the [Tidyverse](https://www.tidyverse.org/), which describes itself as “an opinionated collection of R packages for data science”, including: - `ggplot2` - `dplyr` - `tidyr` The `library` function loads the package specified by the argument, e.g., ```r library(tidyverse) ``` ] .pull-right[  ] ??? Point out the packages we'll use in each section of the course. Notes - Packages (install then load): `library(tidyverse)` - Functions (calling with arguments): `sqrt(4)` (N.b., We'll define our own functions later.) --- # Basic Data Types R supports the representation of and operations on the basic data types, including: - Integer - Double - Character - Logical R also supports additional types, including: - Vector & List - Date - Factor The most common data structure datasets is a *dataframe* (more commonly a *tibble*, a useful derivative of the dataframe), which represents: - Rows (aka *observations*) - Columns (aka *variables*) - Cells (aka *values*) We access column values using the `$` operator. ??? Demo these types using `r-examples.Rmd`. Notes: - We're introducing dataframes/tibbles before the text does because of how often we use tibbles. - The R execution environments of the console and a notebook are different, so loading a library in the console doesn't load it for the notebook code. References: - https://rstudio-education.github.io/datascience-box/course-materials/slides/u2-d11-data-classes/u2-d11-data-classes.html#1 --- # The Pipe Operator We often find ourselves making so-called “nested” function calls, which can be hard to read. ```r distinct(select(filter(mpg, manufacturer == "ford"), model)) ``` The R pipe operator, `%>%`, provided by the [`magrittr` package](https://magrittr.tidyverse.org/), implements nested function calls in a more readable manner. ```r mpg %>% filter(manufacturer == "ford") %>% select(model) %>% distinct() ``` Basic piping - `x %>% f` is equivalent to `f(x)` - `x %>% f(y)` is equivalent to `f(x, y)` - `x %>% f %>% g %>% h` is equivalent to `h(g(f(x)))` ??? - pipe: `gapminder %>% filter(year==2007)` (cf. `filter(gapminder, year==2007)`) (We're introducing this before the text does because of how much we use it with ggplot.) This is not in the text until chapter 4, but we'll be using it all over with ggplot, so introduce it here. Notes - We're postponing the distinction between piping (`%>%`) and layering (`+`). We used layering in the GapMinder example (class 1) and will see it again in the visualization units. - The package is named after R. Magritte because of his [iconic pipe image](https://en.wikipedia.org/wiki/The_Treachery_of_Images) References - https://cs.calvin.edu/courses/data/202/ka37/slides/w03/w3d1-wrangling.html#15 (15-22) - https://ids-s1-20.github.io/slides/week-03/w3-d03-grammar-wrangle/w3-d03-grammar-wrangle.html#19 (19-31) --- # File Processing .pull-left[ R provides support for reading and writing files in various formats, including: - Comma-Separated-Values (CSV) - Tab-Separated_values (TSV) - R Data File (RDF) - Excel (XLSX) ] .pull-right[ ```r bikes <- read_csv("data/bikeshare-day.csv") bikes ``` ``` ## # A tibble: 731 × 16 ## instant dteday season yr mnth holiday weekday working…¹ weath…² temp ## <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 1 2011-01-01 1 0 1 0 6 0 2 0.344 ## 2 2 2011-01-02 1 0 1 0 0 0 2 0.363 ## 3 3 2011-01-03 1 0 1 0 1 1 1 0.196 ## 4 4 2011-01-04 1 0 1 0 2 1 1 0.2 ## 5 5 2011-01-05 1 0 1 0 3 1 1 0.227 ## 6 6 2011-01-06 1 0 1 0 4 1 1 0.204 ## 7 7 2011-01-07 1 0 1 0 5 1 2 0.197 ## 8 8 2011-01-08 1 0 1 0 6 0 2 0.165 ## 9 9 2011-01-09 1 0 1 0 0 0 1 0.138 ## 10 10 2011-01-10 1 0 1 0 1 1 1 0.151 ## # … with 721 more rows, 6 more variables: atemp <dbl>, hum <dbl>, ## # windspeed <dbl>, casual <dbl>, registered <dbl>, cnt <dbl>, and abbreviated ## # variable names ¹workingday, ²weathersit ``` ] ??? Notes - Compare/contrast `read_csv` (tibble) vs `read.csv` (data frame) - Discuss issues with CSV vs Excel regarding data types. References - https://cs.calvin.edu/courses/data/202/21fa/slides/w03/w03-ggplot.html#3 --- # Code Style Coding style conventions are not required to make code run, but they help make the code more readable. - Use consistent names: - Filenames: `data-wrangling.Rmd` (no spaces or capital letters) - Variable names: `hourly_rates` (use `-`, no capital letters) - Put spaces around infix operators (` = `, ` + `, ` <- `, …) - Always put spaces before and line breaks after `%>%` and `+`. ```r mpg %>% filter(manufacturer == "ford") %>% ggplot(aes(x = displ, y = hwy)) + geom_point(mapping = aes(color = class)) + geom_smooth() ``` .footnote[See: *The tidyverse style guide*, https://style.tidyverse.org/] ??? References: - <https://mdsr-book.github.io/mdsr2e/ch-dataII.html#sec:naming> - <https://style.tidyverse.org/>