class: left, top, title-slide .title[ # Visualization 2
Using ggplot ] .author[ ### Keith VanderLinden
Calvin University ] --- # Data Visualization using ggplot .pull-left[ ggplot2 is Tidyverse’s data visualization package. It supports EDA by allowing users to specify plots in a declarative manner and to build them iteratively. ] .pull-right[  ] ??? Being part of Tidyverse means that ggplot will stream with tibbles. --- # Grammar of Graphics ggplot2 implements data visualizations using a layered *grammar of graphics*. .pull-left[  See: *The Grammar of Graphics*, L. Wilkinson, https://www.springer.com/in/book/9780387245447 ] .pull-right[ ```r *<DATA> %>% * ggplot() + * aes( <MAPPINGS> ) * <GEOM_FUNCTION>( stat = <STAT> , position = <POSITION> ) + <COORDINATE_FUNCTION> + <FACET_FUNCTION> + <SCALE_FUNCTION> + <THEME_FUNCTION> ``` <br><br> See: *A Layered Grammar of Graphics*, H. Wickam, http://vita.had.co.nz/papers/layered-grammar.pdf ] ??? Notes: - The data, geom, mappings lines are required; the rest are optional. - Distinguish `%>%` (data pipe - newer) vs `+` (add ggplot layer - older). - Be careful of the three views of graphics here. - *Yau's taxonomy* (MDSR 2), covered in unit 2, is used only to analyze graphs. - *Wilkinson's Grammar of Graphics*, which inspired Wickam's ggplot, is mentioned here only as background and to introduce the idea of layering graphics. - *Wickham's `ggplot`* (R4DS 3, MDSR 3), which is what data scientists actually use. --- # ggplot Coding Example ```r library(palmerpenguins) penguins ``` ``` ## # A tibble: 344 × 8 ## species island bill_length_mm bill_depth_mm flipper_…¹ body_…² sex year ## <fct> <fct> <dbl> <dbl> <int> <int> <fct> <int> ## 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007 ## 2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007 ## 3 Adelie Torgersen 40.3 18 195 3250 fema… 2007 ## 4 Adelie Torgersen NA NA NA NA <NA> 2007 ## 5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007 ## 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007 ## 7 Adelie Torgersen 38.9 17.8 181 3625 fema… 2007 ## 8 Adelie Torgersen 39.2 19.6 195 4675 male 2007 ## 9 Adelie Torgersen 34.1 18.1 193 3475 <NA> 2007 ## 10 Adelie Torgersen 42 20.2 190 4250 <NA> 2007 ## # … with 334 more rows, and abbreviated variable names ¹flipper_length_mm, ## # ²body_mass_g ``` .footnote[Based on [Data Science in a Box](https://datasciencebox.org)] --- .pull-left[ ```r *penguins %>% * ggplot() ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-3-1.png" width="100%" /> ] .midi[ > **Start with the `penguins` data frame and initialize the plot.** ] --- .pull-left[ ```r penguins %>% ggplot() + * aes(x = bill_depth_mm) ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-4-1.png" width="100%" /> ] .midi[ > Start with the `penguins` data frame > and initialize the plot. > **Map bill depth to the x-axis** ] --- .pull-left[ ```r penguins %>% ggplot() + aes(x = bill_depth_mm, * y = bill_length_mm) ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-5-1.png" width="100%" /> ] .midi[ > Start with the `penguins` data frame > and initialize the plot. > Map bill depth to the x-axis > **and map bill length to the y-axis.** ] --- .pull-left[ ```r penguins %>% ggplot() + aes(x = bill_depth_mm, y = bill_length_mm) + * geom_point() ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-6-1.png" width="100%" /> ] .midi[ > Start with the `penguins` data frame > and initialize the plot. > Map bill depth to the x-axis > and map bill length to the y-axis. > **Represent each observation with a point** ] --- .pull-left[ ```r penguins %>% ggplot() + aes(x = bill_depth_mm, y = bill_length_mm, * color = species) + geom_point() ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-7-1.png" width="100%" /> ] .midi[ > Start with the `penguins` data frame > and initialize the plot. > Map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > **and map species to the color of each point.** ] --- .pull-left[ ```r penguins %>% ggplot() + aes(x = bill_depth_mm, y = bill_length_mm, color = species) + geom_point() + * labs( * title = "Penguin bill depth and length" * ) ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-8-1.png" width="100%" /> ] .midi[ > Start with the `penguins` data frame > and initialize the plot. > Map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the color of each point. > **Title the plot "Penguin bill depth and length"** ] --- .pull-left[ ```r penguins %>% ggplot() + aes(x = bill_depth_mm, y = bill_length_mm, color = species) + geom_point() + labs( title = "Penguin bill depth and length", * subtitle = "Adelie, Chinstrap, Gentoo" ) ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-9-1.png" width="100%" /> ] .midi[ > Start with the `penguins` data frame > and initialize the plot. > Map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the color of each point. > Title the plot "Penguin bill depth and length", > **add the subtitle "Adelie, Chinstrap, Gentoo"** ] --- .pull-left[ ```r penguins %>% ggplot() + aes(x = bill_depth_mm, y = bill_length_mm, color = species) + geom_point() + labs( title = "Penguin bill depth and length", subtitle = "Adelie, Chinstrap, Gentoo", * x = "Bill depth (mm)", * y = "Bill length (mm)" ) ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-10-1.png" width="100%" /> ] .midi[ > Start with the `penguins` data frame > and initialize the plot. > Map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the color of each point. > Title the plot "Penguin bill depth and length", > add the subtitle "Adelie, Chinstrap, Gentoo", > **label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively** ] --- .pull-left[ ```r penguins %>% ggplot() + aes(x = bill_depth_mm, y = bill_length_mm, color = species) + geom_point() + labs( title = "Penguin bill depth and length", subtitle = "Adelie, Chinstrap, Gentoo", x = "Bill depth (mm)", y = "Bill length (mm)", * color = "Species", ) ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-11-1.png" width="100%" /> ] .midi[ > Start with the `penguins` data frame > and initialize the plot. > Map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the color of each point. > Title the plot "Penguin bill depth and length", > add the subtitle "Adelie, Chinstrap, Gentoo", > label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, > **label the legend "Species"** ] --- .pull-left[ ```r penguins %>% ggplot() + aes(x = bill_depth_mm, y = bill_length_mm, color = species) + geom_point() + labs( title = "Penguin bill depth and length", subtitle = "Adelie, Chinstrap, Gentoo", x = "Bill depth (mm)", y = "Bill length (mm)", color = "Species", * caption = "Source: Palmer Station LTER" ) ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-12-1.png" width="100%" /> ] .midi[ > Start with the `penguins` data frame > and initialize the plot. > Map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the color of each point. > Title the plot "Penguin bill depth and length", > add the subtitle "Adelie, Chinstrap, Gentoo", > label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, > label the legend "Species", > **and add a caption for the data source.** ] --- .pull-left[ ```r penguins %>% ggplot() + aes(x = bill_depth_mm, y = bill_length_mm, color = species) + geom_point() + labs( title = "Penguin bill depth and length", subtitle = "Adelie, Chinstrap, Gentoo", x = "Bill depth (mm)", y = "Bill length (mm)", color = "Species", caption = "Source: Palmer Station LTER" ) + * scale_color_viridis_d() ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-13-1.png" width="100%" /> ] .midi[ > Start with the `penguins` data frame > and initialize the plot. > Map bill depth to the x-axis > and map bill length to the y-axis. > Represent each observation with a point > and map species to the color of each point. > Title the plot "Penguin bill depth and length", > add the subtitle "Adelie, Chinstrap, Gentoo", > label the x and y axes as "Bill depth (mm)" and "Bill length (mm)", respectively, > label the legend "Species", > and add a caption for the data source. > **Finally, use a discrete color scale that is designed to be perceived by viewers with common forms of color blindness.** ] --- # Aesthetic Options .pull-left[ Some aesthetics options are mapped to specific variables. ```r penguins %>% ggplot() + aes(x = bill_depth_mm, y = bill_length_mm, * color = species, * shape = species, * size = body_mass_g, * alpha = flipper_length_mm) + geom_point() + scale_color_viridis_d() ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-14-1.png" width="100%" /> ] ??? Point out the 4 aesthetics in the resulting plot. One variable (`species`) has two aesthetic forms. Distinguish between: - *mapping*: `color = species` is currently in `aes()` and, thus, is based on the value of the `species` variable. - *setting*: Setting `color="blue"` in geom_point() sets every geom to blue. --- # Geometric Options .pull-left[ Some aesthetics options are mapped to specific variables. ```r penguins %>% ggplot() + aes(x = bill_depth_mm, y = bill_length_mm, color = species, * linetype = species) + geom_point() + * geom_smooth() + scale_color_viridis_d() ``` ``` ## `geom_smooth()` using method = 'loess' and formula = 'y ~ x' ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-15-1.png" width="100%" /> ] ??? Notes: - This plot has two geoms; only one is needed. - Not all aesthetic options work with all geometries, e.g., lines have `linetype`, not `shape`. --- # Faceting (1D) .pull-left[ Faceting creates separate, smaller plots that display subsets of the data. ```r penguins %>% ggplot() + aes(x = bill_depth_mm, y = bill_length_mm) + geom_point() + * facet_wrap(~ species) ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-16-1.png" width="100%" /> ] ??? This is called *small multiples* in N. Yao's taxonomy (MDSR 2.2.1). --- # Faceting (2D) .pull-left[ This is useful for exploring conditional relationships and large datasets. ```r penguins %>% ggplot() + aes(x = bill_depth_mm, y = bill_length_mm) + geom_point() + * facet_grid(sex ~ species) ``` ] .pull-right[ <img src="ggplot_files/figure-html/unnamed-chunk-17-1.png" width="100%" /> ] ??? See datascience Box <https://datasciencebox.org/exploring-data.html> for more faceting options.