This exercise does an exploratory data analysis on a dataset produced by Capital Bikeshare in Washington D.C.
Start by creating a new homework sub-directory (i.e.,
/homework1) and, in it, a new RMarkdown document named
hw1-bikeshare.Rmd. Include the standard assignment header
with your name and the date (e.g., Spring 2022). The document should
produce HTML output.
Let’s imagine we’re hired by the administrators of the Capital Bikeshare program program to help them understand and predict the hourly demand for rental bikes. This understanding will help them plan the number of bikes that need to be available at different parts of the system at different times so that they can avoid cases in which:
Describe this purpose at the beginning of your document.
The data for this problem were collected from the Capital Bikeshare program over the course of two years (2011 and 2012). Researchers at the University of Porto processed the data and augmented it with extra information, as described on this UCI ML Repository webpage.
We’ll use this simplified
version of the dataset that we’ve derived from the original data. It
is in CSV format. Download a copy into a data sub-directory
of your homework solution directory.
Include in your document a description of the source of this dataset, and a code chunk that loads it and prints out the first few rows.
Do the following data exploration exercises and include descriptions of your work in the document:
Name and describe the fields of the dataset.
Say how many rows the dataset contains and what each row represents.
Create a scatter plot showing the total number of rides each day. Sample code is provided below, but you will need to fill in the blanks.
____ %>%
ggplot() +
aes(x = ___, y = ___) +
geom_point() +
geom_smooth() +
labs(
x = "___",
y = "___"
)
Notes on this code:
ggplot() which builds our
plot, in layers.geom_point() and
geom_smooth() one at a time to make sure you understand
what each one does.The result should look like this:
workingday to the color aesthetic.
Your result should look like: You might start this section by coping and pasting from your previous code chunk.
Submit a ZIP of your final Rmd, html, csv etc. files and sub-directories. For instructions on how to do this, see the first lab specification. This will be the workflow for all future homework assignments.
*Exercise based on Data Science in a Box