This exercise continues what we started in Lab 5 by looking more
closely at the actual distances between Denny’s restaurants and La
Quinta motels. Start by creating a new RMarkdown document named
hw3-laquinta.Rmd
using the standard homework format.
Our goal is to continue our consideration of the validity of the claim and La Quinta Hotels are often next to Denny’s restaurants.
Upload copies of the datasets and load them as you did in the lab.
library(tidyverse)
dennys <- read_rds("data/dennys.rds")
laquinta <- read_rds("data/laquinta.rds")
states <- read_csv("data/states.csv", col_types = cols(
name = col_character(),
abbreviation = col_character(),
area = col_double()
))
Note here that some of the dataset files use the RDS format, which differs from the CSV format in that it is binary (not readable using a text editor) and represents R data structures more directly and more efficiently.
1. Filtering the Dataframes
Filter the dataframes to include Denny’s and La Quinta in Alaska (AK)
only, and save the results as dn_ak
and lq_ak
respectively. How many Denny’s and La Quinta locations are there in
Alaska?
Next we’ll calculate the distance between all Denny’s and all La Quinta locations in Alaska. Let’s take this step by step:
1.1. There are 3 Denny’s and 2 La Quinta locations in Alaska. (If you answered differently above, you might want to recheck your answers.)
1.2. Let’s focus on the first Denny’s location. We’ll need to calculate two distances for it: (1) distance between Denny’s 1 and La Quinta 1 and (2) distance between Denny’s 1 and La Quinta (2).
1.3. Now let’s consider all Denny’s locations.
2. Finding Pairings
How many pairings are there between all Denny’s and all La Quinta locations in Alaska, i.e. how many distances do we need to calculate between the locations of these establishments in Alaska?
In order to calculate these distances we need to first restructure
our data to pair the Denny’s and La Quinta locations. To do so, join the
two data frames by state
using the form of
join that keeps all rows and columns from both dn_ak
and
lq_ak
data frames. Name the result
dn_lq_ak
.
3. Analyzing the Pairings
How many observations are in the joined dn_lq_ak
data
frame? What are the names of the variables in this data frame?
Notice the suffixes .x
and .y
on the
variable names. The two data frames both had the same variables, so how
could we tell which data came from which data frame? The .x
comes from the first dataframe, .y
from the second. But
that’s not very informative, so: add
suffix = c("_dn", "_lq")
as an argument to the join
function so that the variables have informative names.
Now that we have the data in the format we wanted, all that is left is to calculate the distances between the pairs.
4. Computing Distances
Add a new variable, called distance
to the
dn_lq_ak
data frame that contains the distances between
each Denny’s and La Quinta locations. Make sure to save the result back
to dn_lq_ak
so that you can use it later.
Because the Earth is spherical rather than flat, we use the great-circle distance, which can be calculated using the Haversine distance formula. Because this is not a built-in function, we’ve provided an implementation for you; just paste this code chunk (including attribution) into your Rmd file:
# Great-circle distance, Implementation from dsbox
haversine <- function(long1, lat1, long2, lat2) {
# convert to radians
long1 = long1 * pi / 180
lat1 = lat1 * pi / 180
long2 = long2 * pi / 180
lat2 = lat2 * pi / 180
# Earth mean radius in km (WGS84 ellipsoid)
R = 6371.009
# Compute the distance in km
a = sin((lat2 - lat1)/2)^2 + cos(lat1) * cos(lat2) * sin((long2 - long1)/2)^2
d = R * 2 * asin(sqrt(a))
return(d)
}
This function takes four arguments:
Note: Longitude comes first!
Here is an example call to this function, to measure the distance between North Hall and the Ecosystem Preserve.
longitude_northhall <- -85.58896068095257
latitude_northhall <- 42.93194236658063
longitude_preserve <- -85.58222583860328
latitude_preserve <- 42.931890017996466
haversine(longitude_northhall, latitude_northhall, longitude_preserve, latitude_preserve)
## [1] 0.5483346
When I measure this distance on Google Maps, I get about 520 meters (about 1700 feet) – well within the margin of error of my clicking.
5. Computing Nearest Distances
For each La Quinta location, calculate the distance to the nearest
Denny’s. To do so, group by La Quinta locations (consider the
address
as a unique identifier) and compute the
min()
of all of the corresponding distance
values. Assign this to a new data frame.
Hint: this data frame will have one row for each La Quinta’s location in Alaska.
Here’s what we get (when we put latitude and longitude in the right order):
# A tibble: 2 × 2
address_lq dist_to_nearest
<chr> <dbl>
1 3501 Minnesota Dr. 2.03
2 4920 Dale Rd 5.20
Alternative You can try this alternative if it makes more sense to you:
distance
, thenslice_head()
to pick out the first row in each
group.This results in:
# A tibble: 2 × 5
# Groups: address_lq, city_lq [2]
address_lq city_lq address_dn city_dn distance
<chr> <chr> <chr> <chr> <dbl>
1 3501 Minnesota Dr. "\nAnchorage" 2900 Denali Anchorage 2.03
2 4920 Dale Rd "\nFairbanks" 1929 Airport Way Fairbanks 5.20
6. Assessing Collocation
Are there any La Quinta locations in Alaska that are “next to Denny’s”? Refer to the result of the previous exercise.
7. Consider Another State
Pick another state and perform the same analysis as you did above. What are the results and are they any different from what you found for Alaska?
Discuss what you discovered about the alleged collocation of Denny’s and La Quinta.