In this document, we explore the MPG dataset provided in the TidyVerse package. It will follow the model of the lab 1.2 exploration of the Seattle Pets dataset.
🚧 Start with a code chunk that loads the TidyVerse package and displays a summary of the structure and contents of the mpg dataset. Then include the following:
https://ggplot2.tidyverse.org/reference/mpg.html).The MPG dataset is pre-loaded in the TidyVerse package, but we could save it to a file and reload it from there.
🚧 Create a data sub-directory for this lab and save the MPG dataset in CSV format. To do this, include a code chunk that uses the pipe operator to pass the mpg dataset object into a call to the write_csv() function. You can find a specification of this function in Rstudio’ Data import cheat sheet (in RStudio, choose Help→Cheat Sheets→Browse cheat sheets and search for “Data import with readr”). This code chunk will likely include some rather ugly message output, which you can suppress by adding message=FALSE to the code chunk header.
We can now read the MPG dataset back in from the file.
🚧 Include a code chunk that reads the CSV file you just wrote and assigned the input to a new object with an appropriate name. Display the structure of this re-read dataset and list the differences, if any, you see between it and the original MPG dataset.
RMarkdown allows us to include images.
🚧 Include RMarkdown code to include the RMarkdow logo found here https://cs.calvin.edu/courses/info/601/resources/images/rmarkdown-logo.png.
RMarkdown also allows us to include equations and inline code computations. For example, Einstein’s famous equation, \(e = mc^2\), tells us that given the speed of light (29979245800 cm/second), the energy stored in a mass of 1 gram is: 8.9875518^{20}. (Yep, that’s a pretty big number.)
🚧 Include a statement here that computes the mean city and highway MPG values. You can compute the mean using the mean() function and you can access the city and highway MPG value vectors using mpg$cty and mpg$hwy.
This document has already includes Tibbles (e.g., mpg), a particularly useful version of the standard data frame used to store datasets. The other data types we’ll use in the course include Dates and Factors.
A date is a special type used to represent date-times. For example, it is currently 2022-03-10 13:53:57. Dates can be manipulated using the lubridate package.
🚧 Use the lubridate::make_date(year = ??) function to convert the MPG year variable into a date value. This will assume that the month and day are January 1.
A factor is a special type of vector used to represent categorical data values. For example, though the drive variable in the MPG dataset is represented as a character, it’s probably best seen as a value from a short list of possible categories: ‘f’, ‘4’, …
🚧 Convert the MPG drive vector into a factor using the as.factor() function. What are the values for this variable in the dataset and what do they stand for?
🚧 Include one query of the dataset here that shows something interesting. Include the R code, its output, and your interpretation of the results.
*Exercise based on Data Science in a Box