For this project, your code will read earthquake data from the USGS web site and plot it on a world map. Your code will do this by using a plotting library that has builtin support for maps. You will have it place little markers over the places where the earthquakes occurred during the last week. When you mouse over on a marker, a little bubble will pop up to display information about the earthquake.
This assignment will have us practice:
- Accessing, manipulating, and filtering tabular data
- Plotting with
pandasandmatplotlib - Using the split-apply-combine pattern to summarize data
As a bonus, we’ll play with making animated maps!
Strategy note ¶
For each exercise, try out the expressions and operations in the Thonny Shell first. Once you’re confident that you have something working and understand why, then think about where in your program it should go (not necessarily at the end!) and copy it into your program there.
Step 1: Getting the data ¶
Create earthquakes.py. Lay it out as usual, with sections for imports, constants, functions, and main code. We won’t actually need to write any functions for this exercise, though, so you can omit that section.
Use the following imports:
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
Run this code. It may fail because plotly is not available. If so,
add the package to Thonny by selecting “Manage Packages” from the “Tools” menu.
Next, we’ll ask pandas to read a CSV file of all of the earthquakes that have happened in the last week. Documentation is given on the USGS website. Do this by:
- Use your web browser to download https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv to the same place as
earthquakes.py. - Call
pd.read_csvwith that file name (all_day.csv). - Assign the result to
quakes. - Run the code. In the Shell (not the program code), look at
quakes.info()andquakes.head()to see what data it contains. Notice that two columns are timestamps, but they are stored as “object” (which usually means “string” in pandas). - Tell pandas to interpret those strings as date-times by revising your
read_csvcall: addparse_dates=['time', 'updated']as an argument to yourread_csv, so it should look likequakes = pd.read_csv('all_day.csv', parse_dates=['time', 'updated']). Check in the.info()that thetimecolumn has adatetime64type now.
This sort of thing is common: rather than always adding new things to the end of your program, it’s often easiest to fix a problem by revising earlier code. We’ll continue this practice for the rest of this assignment: add new elements to your code in the place that they belong, not necessarily the end.
Step 2: Making a map ¶
We’ll use the Plotly library to make a map. It knows about DataFrames and can work with them directly. You can copy and paste this code chunk to get started:
fig = px.scatter_geo(
quakes,
lat = "latitude", lon = "longitude",
hover_name = "place",
color = "mag",
projection = "natural earth",
scope = "world")
fig.show()
Run your code and check that you get a map in your web browser!
If you see a map, ignore this note. Otherwise: if a web browser tab pops up but it has an error, something on your computer (like an antivirus) may be blocking connections between your browser and Thonny. So here’s a workaround you can use if you’re having that problem: instead of
fig.show(), writefig.write_html('map.html'). Then, after running your code, openmap.htmlin your browser or file explorer. You’ll need to reload the web page by hand any time you update your plot.
Disable each of the arguments in turn (by commenting them out, i.e., adding a # before the corresponding line) and rerun your code each time. Write comments for each line based on what you observe it to do.
Note: documentation for the
scatter_geofunction is available here. Details about parameters likescopeandprojectionare available here. For example, “The available scopes are: ‘world’, ‘usa’, ‘europe’, ‘asia’, ‘africa’, ‘north america’, ‘south america’.”. Plotly also has lots of other plotting and mapping functions that you can explore on its website.
Try changing some arguments to see what they do.
Step 3: Counting quakes ¶
Are there more or fewer earthquakes at different times of day? Count the number of quakes occurring at each hour. We’ll do this in two steps:
-
Make an
hourcolumn. On the Shell, try runningquakes['time'].dt.hour. Notice that it gives us aSeriesof integers–the hour in which each quake occurred. Assign that to a new column:quakes['hour'] = quakes['time'].dt.hour. -
Count the number of times each value occurs in
hour. Do this by grouping the data frame onhourand calling the.size()method on the grouped data frame:.groupby('hour').size(). -
Save the result in a variable
quakes_by_hour. The result is a pandasSeries. The hours are stored inquakes_by_hour.index(since pandas thinks of them as the “names” of each row.) -
Plot the quake counts. Either use
quakes_by_hour.plot.bar()(or similar), or callplt.bar(...). Remember that you’ll need to runplt.show()to actually see the plot.
As an alternative to grouping, you could also use a handy method called
.value_counts()on thehourseries. (Look in the documentation forvalue_countsto see how to tell it not to sort by count, so you get the values sorted by hour instead.)
Does your plot provide evidence that time of day affects the number of earthquakes? Why or why not? (You don’t need to include an answer to this, but think like a scientist: what can you learn from this data?)
Step 3: Filtering out small quakes ¶
There will probably be some very small quakes. Let’s avoid showing those on the map.
- Define a constant
MIN_MAGNITUDE, set it to 1.0 for now. (Later, try different values.) - Make a new column
big_enough(i.e.,quakes['big_enough'] = ...by comparing themagcolumn toMIN_MAGNITUDE. - Make a
filtered_quakesvariable by using thebig_enoughcolumn as an index intoquakes. - Edit the map code you already have to make your map use
filtered_quakesinstead. Make sure that it shows fewer quakes; try changingMIN_MAGNITUDEuntil you’re confident that it’s working. - Have your program print out the number of quakes total and after filtering. Use the
len()function to get the number of rows in a data frame. (e.g., “In the last day, 150 earthquakes were detected, of which 100 had magnitude bigger than 1.5.")
Step 4: Which hemisphere has stronger quakes? ¶
- Make a boolean column called
northern_hemispherethat tells whether the quake occurred above the equator or not. Do this by comparinglatitudewith 0. Put the new column in the originalquakesdata frame, before you filter it. - Compute the mean
magfor each value ofnorthern_hemisphere. Do this by grouping by thenorthern_hemispherecolumn, then computing the mean of themagcolumn. This can be done in one line of pandas code; see the class slides for examples.
Step 5: Getting multiple days ¶
So far we’ve only been working with one day of earthquake data. Let’s expand it to a week!
- Notice that the URL above contains
all_day. Change that toall_week, and change the file name being loaded accordingly. - Add a
datecolumn usingquakes['date'] = quakes['time'].dt.strftime("%Y-%m-%d"). (This uses a classic function to make thestringformat of atime, withYear,month, andday.) Consider where in your code would make most sense to do this operation. (Is there already some other line that does a related operation?) - Add
animation_frame = "date"to the mapping function call to make the plot animate over days. Press the Play button on the plot (at the bottom) to see the earthquakes each day. - Notice how the color range changes day to day. Use the
range_colorparameter toscatter_geoto make it consistent. It’s expecting a list with two elements:[smallest_mag, biggest_mag]. Try to computebiggest_magfrom the data itself.
Step 6: Filter by region ¶
You’re currently filtering quakes by magnitude. Add a second criterion to the filter: only include quakes in the northern hemisphere. Do this by computing the & (element-by-element and) of the big_enough column and the northern_hemisphere column.
Grading Rubric ¶
8 points total:
-
1 point for each step
-
2 points for good variable names, constant names, formatting and comments.