Homework 10: Mapping Earthquake Data

For this project, your code will read earthquake data from the USGS web site and plot it on a world map. Your code will do this by using a plotting library that has builtin support for maps. You will have it place little markers over the places where the earthquakes occurred during the last week. When you mouse over on a marker, a little bubble will pop up to display information about the earthquake.

This assignment will have us practice:

As a bonus, we’ll play with making animated maps!

Note: we’re using plotly instead of matplotlib. This is because we’re considering moving towards plotly in future courses, so I’d like to hear your experience with using it here.

Strategy note

Step 1: Getting the data

Create earthquakes.py. Lay it out as usual, with sections for imports, constants, functions, and main code. We won’t actually need to write any functions for this exercise, though, so you can omit that section.

Use the following imports:

import streamlit as st
import pandas as pd
import plotly.express as px

Next, we’ll ask pandas to read a CSV file of all of the earthquakes that have happened in the last week. Documentation is given on the USGS website. Do this by:

  1. Use your web browser to download https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv to the same place as earthquakes.py.
  2. Call pd.read_csv with that file name (all_day.csv).
  3. Assign the result to quakes.
  4. Run the code. In the Shell (not the program code), look at quakes.info() and quakes.head() to see what data it contains. Notice that two columns are timestamps, but they are stored as “object” (which usually means “string” in pandas).
  5. Tell pandas to interpret those strings as date-times by revising your read_csv call: add parse_dates=['time', 'updated'] as an argument to your read_csv, so it should look like quakes = pd.read_csv('all_day.csv', parse_dates=['time', 'updated']). Check in the .info() that the time column has a datetime64 type now. (This step should not involve adding additional statements. Note that you could use pd.to_datetime to do this after-the-fact, if that were necessary.)

This sort of thing is common: rather than always adding new things to the end of your program, it’s often easiest to fix a problem by revising earlier code. We’ll continue this practice for the rest of this assignment: add new elements to your code in the place that they belong, not necessarily the end.

Step 2: Making a map

We’ll use the Plotly library to make a map. It knows about DataFrames and can work with them directly. You can copy and paste this code chunk to get started:

fig = px.scatter_geo(
    quakes,
    lat = "latitude", lon = "longitude",
    hover_name = "place",
    color = "mag",
    projection = "natural earth",
    scope = "world"
)
st.plotly_chart(fig, use_container_width=True)

Run your code and check that you get a map in your web browser!

Disable each of the arguments in turn (by commenting them out, i.e., adding a # before the corresponding line) and rerun your code each time. Write comments for each line based on what you observe it to do.

Note: documentation for the scatter_geo function is available here. Details about parameters like scope and projection are available here. For example, “The available scopes are: ‘world’, ‘usa’, ’europe’, ‘asia’, ‘africa’, ’north america’, ‘south america’.”. Plotly also has lots of other plotting and mapping functions that you can explore on its website.

Try changing some arguments to see what they do.

Step 3: Counting quakes

Are there more or fewer earthquakes at different times of day? Count the number of quakes occurring at each hour.

First we’ll need to make an hour column. It will have values of 0, 1, 2, …, 23. To do this:

Now we can make some counts. Let’s try two different ways of doing this:

Does your plot provide evidence that time of day affects the number of earthquakes? Why or why not? (You don’t need to include an answer to this, but think like a scientist: what can you learn from this data?)

Step 3: Filtering out small quakes

There will probably be some very small quakes. Let’s avoid showing those on the map.

  1. Make a st.slider for the minimum magnitude. You can let it range from, say, -1.0 to 10.0.
  2. Make a new column big_enough (i.e., quakes['big_enough'] = ... by comparing the mag column to the number from the slider.
  3. Make a filtered_quakes variable by using the big_enough column as an index into quakes. (The previous sentence should be all you need, but you can refer to the reading or the class slides for guidance on this.)
  4. Edit the map code you already have to make your map use filtered_quakes instead of just quakes. Make sure that it shows fewer quakes; try changing the slider until you’re confident that it’s working.
  5. Have your program print out the number of quakes total and after filtering. Use the len() function to get the number of rows in a data frame. (e.g., “In the last day, 150 earthquakes were detected, of which 100 had magnitude bigger than 1.5.”)

Step 4: Which hemisphere has stronger quakes?

  1. Make a boolean column called northern_hemisphere that tells whether the quake occurred above the equator or not. Do this by comparing latitude with 0. Put the new column in the original quakes data frame, before you filter it.
  2. Compute the mean mag for each value of northern_hemisphere. Do this by grouping by the northern_hemisphere column, then computing the mean of the mag column. This can be done in one line of pandas code; see the class slides for examples.

Step 5 (optional): Getting multiple days

So far we’ve only been working with one day of earthquake data. Let’s expand it to a week!

  1. Notice that the URL above contains all_day. Change that to all_week, and change the file name being loaded accordingly.
  2. Add a date column using quakes['date'] = quakes['time'].dt.strftime("%Y-%m-%d"). (This uses a classic function to make the string format of a time, with Year, month, and day.) Consider where in your code would make most sense to do this operation. (Is there already some other line that does a related operation?)
  3. Add animation_frame = "date" to the mapping function call to make the plot animate over days. Press the Play button on the plot (at the bottom) to see the earthquakes each day.
  4. Notice how the color range changes day to day. Use the range_color parameter to scatter_geo to make it consistent. It’s expecting a list with two elements: [smallest_mag, biggest_mag]. Try to compute biggest_mag from the data itself.

Grading Rubric

close all nutshells