Homework 3 - Wrangling Spatial Data

This exercise continues what we started in Lab 5 by looking more closely at the actual distances between Denny’s restaurants and La Quinta motels. Start by creating a new RMarkdown document named hw3-laquinta.Rmd using the standard homework format.

Our goal is to continue our consideration of the validity of the claim and La Quinta Hotels are often next to Denny’s restaurants.

Data

Upload copies of the datasets and load them as you did in the lab.

library(tidyverse)

dennys <- read_rds("data/dennys.rds")
laquinta <- read_rds("data/laquinta.rds")
states <- read_csv("data/states.csv", col_types = cols(
  name = col_character(),
  abbreviation = col_character(),
  area = col_double()
))

Note here that some of the dataset files use the RDS format, which differs from the CSV format in that it is binary (not readable using a text editor) and represents R data structures more directly and more efficiently.

Analysis

1. Filtering the Dataframes

Filter the dataframes to include Denny’s and La Quinta in Alaska (AK) only, and save the results as dn_ak and lq_ak respectively. How many Denny’s and La Quinta locations are there in Alaska?

Next we’ll calculate the distance between all Denny’s and all La Quinta locations in Alaska. Let’s take this step by step:

1.1. There are 3 Denny’s and 2 La Quinta locations in Alaska. (If you answered differently above, you might want to recheck your answers.)
1.2. Let’s focus on the first Denny’s location. We’ll need to calculate two distances for it: (1) distance between Denny’s 1 and La Quinta 1 and (2) distance between Denny’s 1 and La Quinta (2).
1.3. Now let’s consider all Denny’s locations.

2. Finding Pairings

How many pairings are there between all Denny’s and all La Quinta locations in Alaska, i.e. how many distances do we need to calculate between the locations of these establishments in Alaska?

In order to calculate these distances we need to first restructure our data to pair the Denny’s and La Quinta locations. To do so, join the two data frames by state using the form of join that keeps all rows and columns from both dn_ak and lq_ak data frames. Name the result dn_lq_ak.

3. Analyzing the Pairings

How many observations are in the joined dn_lq_ak data frame? What are the names of the variables in this data frame?

Notice the suffixes .x and .y on the variable names. The two data frames both had the same variables, so how could we tell which data came from which data frame? The .x comes from the first dataframe, .y from the second. But that’s not very informative, so: add suffix = c("_dn", "_lq") as an argument to the join function so that the variables have informative names.

Now that we have the data in the format we wanted, all that is left is to calculate the distances between the pairs.

4. Computing Distances

Add a new variable, called distance to the dn_lq_ak data frame that contains the distances between each Denny’s and La Quinta locations. Make sure to save the result back to dn_lq_ak so that you can use it later.

Because the Earth is spherical rather than flat, we use the great-circle distance, which can be calculated using the Haversine distance formula. Because this is not a built-in function, we’ve provided an implementation for you; just paste this code chunk (including attribution) into your Rmd file:

# Great-circle distance, Implementation from dsbox
haversine <- function(long1, lat1, long2, lat2) {
  # convert to radians
  long1 = long1 * pi / 180
  lat1  = lat1  * pi / 180
  long2 = long2 * pi / 180
  lat2  = lat2  * pi / 180
  
  # Earth mean radius in km (WGS84 ellipsoid)
  R = 6371.009
  
  # Compute the distance in km
  a = sin((lat2 - lat1)/2)^2 + cos(lat1) * cos(lat2) * sin((long2 - long1)/2)^2
  d = R * 2 * asin(sqrt(a))
  return(d)
}

This function takes four arguments:

Longitude and latitude of the first location
Longitude and latitude of the second location

Note: Longitude comes first!

Here is an example call to this function, to measure the distance between North Hall and the Ecosystem Preserve.

longitude_northhall <- -85.58896068095257
latitude_northhall  <-  42.93194236658063
longitude_preserve  <- -85.58222583860328
latitude_preserve   <-  42.931890017996466
haversine(longitude_northhall, latitude_northhall, longitude_preserve, latitude_preserve)

## [1] 0.5483346

When I measure this distance on Google Maps, I get about 520 meters (about 1700 feet) – well within the margin of error of my clicking.

5. Computing Nearest Distances

For each La Quinta location, calculate the distance to the nearest Denny’s. To do so, group by La Quinta locations (consider the address as a unique identifier) and compute the min() of all of the corresponding distance values. Assign this to a new data frame.

Hint: this data frame will have one row for each La Quinta’s location in Alaska.

Here’s what we get (when we put latitude and longitude in the right order):

# A tibble: 2 × 2
  address_lq         dist_to_nearest
  <chr>                        <dbl>
1 3501 Minnesota Dr.            2.03
2 4920 Dale Rd                  5.20

Alternative You can try this alternative if it makes more sense to you:

Group by La Quinta location, then
Sort by distance, then
slice_head() to pick out the first row in each group.

This results in:

# A tibble: 2 × 5
# Groups:   address_lq, city_lq [2]
 address_lq         city_lq       address_dn       city_dn   distance
 <chr>              <chr>         <chr>            <chr>        <dbl>
1 3501 Minnesota Dr. "\nAnchorage" 2900 Denali      Anchorage     2.03
2 4920 Dale Rd       "\nFairbanks" 1929 Airport Way Fairbanks     5.20

6. Assessing Collocation

Are there any La Quinta locations in Alaska that are “next to Denny’s”? Refer to the result of the previous exercise.

7. Consider Another State

Pick another state and perform the same analysis as you did above. What are the results and are they any different from what you found for Alaska?

Homework 3 - Wrangling Spatial Data

Data

Analysis

Conclusion