For this project, your code will read earthquake data from the USGS web site and plot it on a world map. Your code will do this by using a plotting library that has builtin support for maps. You will have it place little markers over the places where the earthquakes occurred during the last week. When you mouse over on a marker, a little bubble will pop up to display information about the earthquake.
This assignment will have us practice:
- Accessing, manipulating, and filtering tabular data
- Plotting with
plotly
- Using the split-apply-combine pattern to summarize data
As a bonus, we’ll play with making animated maps!
Note: we’re using plotly
instead of matplotlib
. This is because we’re considering moving towards plotly in future courses, so I’d like to hear your experience with using it here.
Strategy note ¶
-
For each exercise, try out the expressions and operations in the Thonny Shell (or an
st.write
call) first. Once you’re confident that you have something working and understand why it works, then think about where in your program it should logically go (not necessarily at the end!) and put it into your program there. -
If you need a refresher on the Pandas operations, walk through the Week 10 slides.
Step 1: Getting the data ¶
Create earthquakes.py
. Lay it out as usual, with sections for imports, constants, functions, and main code. We won’t actually need to write any functions for this exercise, though, so you can omit that section.
Use the following imports:
import streamlit as st
import pandas as pd
import plotly.express as px
Next, we’ll ask pandas to read a CSV file of all of the earthquakes that have happened in the last week. Documentation is given on the USGS website. Do this by:
- Use your web browser to download https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv to the same place as
earthquakes.py
. - Call
pd.read_csv
with that file name (all_day.csv
). - Assign the result to
quakes
. - Run the code. In the Shell (not the program code), look at
quakes.info()
andquakes.head()
to see what data it contains. Notice that two columns are timestamps, but they are stored as “object” (which usually means “string” in pandas). - Tell pandas to interpret those strings as date-times by revising your
read_csv
call: addparse_dates=['time', 'updated']
as an argument to yourread_csv
, so it should look likequakes = pd.read_csv('all_day.csv', parse_dates=['time', 'updated'])
. Check in the.info()
that thetime
column has adatetime64
type now. (This step should not involve adding additional statements. Note that you could usepd.to_datetime
to do this after-the-fact, if that were necessary.)
This sort of thing is common: rather than always adding new things to the end of your program, it’s often easiest to fix a problem by revising earlier code. We’ll continue this practice for the rest of this assignment: add new elements to your code in the place that they belong, not necessarily the end.
Step 2: Making a map ¶
We’ll use the Plotly library to make a map. It knows about DataFrame
s and can work with them directly. You can copy and paste this code chunk to get started:
fig = px.scatter_geo(
quakes,
lat = "latitude", lon = "longitude",
hover_name = "place",
color = "mag",
projection = "natural earth",
scope = "world"
)
st.plotly_chart(fig, use_container_width=True)
Run your code and check that you get a map in your web browser!
Disable each of the arguments in turn (by commenting them out, i.e., adding a #
before the corresponding line) and rerun your code each time. Write comments for each line based on what you observe it to do.
Note: documentation for the
scatter_geo
function is available here. Details about parameters likescope
andprojection
are available here. For example, “The available scopes are: ‘world’, ‘usa’, ’europe’, ‘asia’, ‘africa’, ’north america’, ‘south america’.”. Plotly also has lots of other plotting and mapping functions that you can explore on its website.
Try changing some arguments to see what they do.
Step 3: Counting quakes ¶
Are there more or fewer earthquakes at different times of day? Count the number of quakes occurring at each hour.
First we’ll need to make an hour
column. It will have values of 0, 1, 2, …, 23. To do this:
Now we can make some counts. Let’s try two different ways of doing this:
Does your plot provide evidence that time of day affects the number of earthquakes? Why or why not? (You don’t need to include an answer to this, but think like a scientist: what can you learn from this data?)
Step 3: Filtering out small quakes ¶
There will probably be some very small quakes. Let’s avoid showing those on the map.
- Make a
st.slider
for the minimum magnitude. You can let it range from, say, -1.0 to 10.0. - Make a new column
big_enough
(i.e.,quakes['big_enough'] = ...
by comparing themag
column to the number from the slider. - Make a
filtered_quakes
variable by using thebig_enough
column as an index intoquakes
. (The previous sentence should be all you need, but you can refer to the reading or the class slides for guidance on this.) - Edit the map code you already have to make your map use
filtered_quakes
instead of justquakes
. Make sure that it shows fewer quakes; try changing the slider until you’re confident that it’s working. - Have your program print out the number of quakes total and after filtering. Use the
len()
function to get the number of rows in a data frame. (e.g., “In the last day, 150 earthquakes were detected, of which 100 had magnitude bigger than 1.5.”)
Step 4: Which hemisphere has stronger quakes? ¶
- Make a boolean column called
northern_hemisphere
that tells whether the quake occurred above the equator or not. Do this by comparinglatitude
with 0. Put the new column in the originalquakes
data frame, before you filter it. - Compute the mean
mag
for each value ofnorthern_hemisphere
. Do this by grouping by thenorthern_hemisphere
column, then computing the mean of themag
column. This can be done in one line of pandas code; see the class slides for examples.
Step 5 (optional): Getting multiple days ¶
So far we’ve only been working with one day of earthquake data. Let’s expand it to a week!
- Notice that the URL above contains
all_day
. Change that toall_week
, and change the file name being loaded accordingly. - Add a
date
column usingquakes['date'] = quakes['time'].dt.strftime("%Y-%m-%d")
. (This uses a classic function to make thestr
ingf
ormat of atime
, withY
ear,m
onth, andd
ay.) Consider where in your code would make most sense to do this operation. (Is there already some other line that does a related operation?) - Add
animation_frame = "date"
to the mapping function call to make the plot animate over days. Press the Play button on the plot (at the bottom) to see the earthquakes each day. - Notice how the color range changes day to day. Use the
range_color
parameter toscatter_geo
to make it consistent. It’s expecting a list with two elements:[smallest_mag, biggest_mag]
. Try to computebiggest_mag
from the data itself.
Grading Rubric ¶
-
1 point for each non-optional step
-
2 points for good variable names, constant names, formatting and comments.