This assignment continues our work with the Capital Bikeshare dataset. Our goal continues to be to understand ridership patterns to evaluate the current system and suggest potential improvements. Towards that end, we will construct some more visualizations, this time using more fine-grained ridership data.

Data

The dataset is an updated dataset based on the Capital Bikeshare dataset used before: data/bikeshare-day.csv. Download it into the shared data sub-directory as usual, and then read it as follows. (N.b. we’ll discuss the mutate() function later in the course; for now, just note that this code converts the listed field values into factors.):

daily_rides <- read_csv("data/bikeshare-day.csv") %>%
    mutate(
      across(
        c(season, year, holiday, workingday, day_of_week, weather_type, rider_type), 
        as.factor
        ))
daily_rides
## # A tibble: 1,462 × 13
##    date       rider_t…¹ rides season year  holiday worki…² day_o…³ weath…⁴  temp
##    <date>     <fct>     <dbl> <fct>  <fct> <fct>   <fct>   <fct>   <fct>   <dbl>
##  1 2011-01-01 casual      331 W      2011  N       weekend 6       2        8.18
##  2 2011-01-01 register…   654 W      2011  N       weekend 6       2        8.18
##  3 2011-01-02 casual      131 W      2011  N       weekend 0       2        9.08
##  4 2011-01-02 register…   670 W      2011  N       weekend 0       2        9.08
##  5 2011-01-03 casual      120 W      2011  N       workday 1       1        1.23
##  6 2011-01-03 register…  1229 W      2011  N       workday 1       1        1.23
##  7 2011-01-04 casual      108 W      2011  N       workday 2       1        1.4 
##  8 2011-01-04 register…  1454 W      2011  N       workday 2       1        1.4 
##  9 2011-01-05 casual       82 W      2011  N       workday 3       1        2.67
## 10 2011-01-05 register…  1518 W      2011  N       workday 3       1        2.67
## # … with 1,452 more rows, 3 more variables: feels_like <dbl>, humidity <dbl>,
## #   wind_speed <dbl>, and abbreviated variable names ¹​rider_type, ²​workingday,
## #   ³​day_of_week, ⁴​weather_type

Observe that there are some extra columns in the dataset now.

The id columns are the columns that uniquely identify an observation (sometimes called a “case” instead of “observation”). In Homework 1, we only had one id column, date, because we had one observation for each date. The dataset for this homework has two id columns:

The additional id column means that we’ve now broken down the data by rider type (rider_type). Some riders have registered for a Capital Bikeshare membership to get better rates. Other riders just bought a single trip or short-term pass, so we call them casual riders. (Nb., according to the source data, “casual” riders include: Single Trip, 24-Hour Pass, 3-Day Pass or 5-Day Pass).

So: each row is the count of how many rides were completed on a given day by a given type of rider. For each row, we have the following observed variables:

For a description of the original fields, perhaps with different names, see the source data.

Analysis

Do the following data exploration exercises and include descriptions of your work in the document:

1. Label the days of the week.

The data set uses the integers 0 through 6 to label days of the week. It does not document, however, what 0 means, or what 6 means. If we want to make understandable plots, we should label these days of the week. To do this: first, figure out what day-of-week codes map to what days-of-week (see the glimpse of the dataset given above for evidence); and then do the following.

daily_rides <- daily_rides %>%
  mutate(day_of_week = factor(day_of_week, levels = c(0, 1, 2, 3, 4, 5, 6), labels = c(_____)))

2. Describe a row.

Describe, in one or two sentences, the information conveyed by the first row in the data frame. Focus you description on only following fields: date; rider_type; rides; holiday; workingday; and day_of_week.

3. Visualize rides by date and by rider type.

Make a scatterplot of the number of rides by date, broken down by type of rider. Tips:

Here is one possibility.

Write a brief interpretation of this plot.

4. Experiment with mapping vs faceting.

Now let’s look at workdays vs weekends (and holidays), which is encoded in the workingday variable, as we did in Homework 1. Try the following.

Once you’re done, pick two plots to leave in this section and remove all the others. Describe the structure the each plot, and compare and contrast the value of the plots. For which purpose is each one better? What about the design of the plot makes it fit that purpose?

5. Explore how ridership varies over a typical week.

We want to find out how ridership varies over a typical week. Before moving on, consider what question is being asking about the relationship between which variables. It can help to sketch a visualization on scrap paper. Make a plot that helps us answer the question.

Here’s one possible plot.

Once you have that plot, try a few variations: faceting, using different plot types, etc.

Finally, write a one-or-two-sentence description of what the plot tells you about the data.

6. Create a new plot of your own design.

Pick another variable or two from the list of variables above. Make a plot of their relationship and write a one-sentence description of what the plot suggests about ridership based on the data.

Conclusion

This homework has meandered around a bit, but it has created a few useful visualizations of the bike-share data. Make a general recommendation for Capital Bikeshare, based on your analysis, on how to plan the number of bikes.


*Exercise based on Data Science in a Box