Homework 9: Mapping Earthquake Data

For this project, your code will read earthquake data from the USGS web site and plot it on a world map. Your code will do this by using a plotting library that has builtin support for maps. You will have it place little markers over the places where the earthquakes occurred during the last week. When you mouse over on a marker, a little bubble will pop up to display information about the earthquake.

This assignment will have us practice:

Accessing, manipulating, and filtering tabular data
Plotting with pandas and matplotlib
Using the split-apply-combine pattern to summarize data

As a bonus, we’ll play with making animated maps!

Strategy note ¶

For each exercise, try out the expressions and operations in the Thonny Shell first. Once you’re confident that you have something working and understand why, then think about where in your program it should go (not necessarily at the end!) and copy it into your program there.

Step 1: Getting the data ¶

Create earthquakes.py. Lay it out as usual, with sections for imports, constants, functions, and main code. We won’t actually need to write any functions for this exercise, though, so you can omit that section.

Use the following imports:

import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px

Run this code. It may fail because plotly is not available. If so, add the package to Thonny by selecting “Manage Packages” from the “Tools” menu.

Next, we’ll ask pandas to read a CSV file of all of the earthquakes that have happened in the last week. Documentation is given on the USGS website. Do this by:

Use your web browser to download https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.csv to the same place as earthquakes.py.
Call pd.read_csv with that file name (all_day.csv).
Assign the result to quakes.
Run the code. In the Shell (not the program code), look at quakes.info() and quakes.head() to see what data it contains. Notice that two columns are timestamps, but they are stored as “object” (which usually means “string” in pandas).
Tell pandas to interpret those strings as date-times by revising your read_csv call: add parse_dates=['time', 'updated'] as an argument to your read_csv, so it should look like quakes = pd.read_csv('all_day.csv', parse_dates=['time', 'updated']). Check in the .info() that the time column has a datetime64 type now.

This sort of thing is common: rather than always adding new things to the end of your program, it’s often easiest to fix a problem by revising earlier code. We’ll continue this practice for the rest of this assignment: add new elements to your code in the place that they belong, not necessarily the end.

Step 2: Making a map ¶

We’ll use the Plotly library to make a map. It knows about DataFrames and can work with them directly. You can copy and paste this code chunk to get started:

fig = px.scatter_geo(
    quakes,
    lat = "latitude", lon = "longitude",
    hover_name = "place",
    color = "mag",
    projection = "natural earth",
    scope = "world")
fig.show()

Run your code and check that you get a map in your web browser!

If you see a map, ignore this note. Otherwise: if a web browser tab pops up but it has an error, something on your computer (like an antivirus) may be blocking connections between your browser and Thonny. So here’s a workaround you can use if you’re having that problem: instead of fig.show(), write fig.write_html('map.html'). Then, after running your code, open map.html in your browser or file explorer. You’ll need to reload the web page by hand any time you update your plot.

Disable each of the arguments in turn (by commenting them out, i.e., adding a # before the corresponding line) and rerun your code each time. Write comments for each line based on what you observe it to do.

Note: documentation for the scatter_geo function is available here. Details about parameters like scope and projection are available here. For example, “The available scopes are: ‘world’, ‘usa’, ‘europe’, ‘asia’, ‘africa’, ‘north america’, ‘south america’.”. Plotly also has lots of other plotting and mapping functions that you can explore on its website.

Try changing some arguments to see what they do.

Step 3: Counting quakes ¶

Are there more or fewer earthquakes at different times of day? Count the number of quakes occurring at each hour. We’ll do this in two steps:

Make an hour column. On the Shell, try running quakes['time'].dt.hour. Notice that it gives us a Series of integers–the hour in which each quake occurred. Assign that to a new column: quakes['hour'] = quakes['time'].dt.hour.
Count the number of times each value occurs in hour. Do this by grouping the data frame on hour and calling the .size() method on the grouped data frame: .groupby('hour').size().
Save the result in a variable quakes_by_hour. The result is a pandas Series. The hours are stored in quakes_by_hour.index (since pandas thinks of them as the “names” of each row.)
Plot the quake counts. Either use quakes_by_hour.plot.bar() (or similar), or call plt.bar(...). Remember that you’ll need to run plt.show() to actually see the plot.

As an alternative to grouping, you could also use a handy method called .value_counts() on the hour series. (Look in the documentation for value_counts to see how to tell it not to sort by count, so you get the values sorted by hour instead.)

Does your plot provide evidence that time of day affects the number of earthquakes? Why or why not? (You don’t need to include an answer to this, but think like a scientist: what can you learn from this data?)

Step 3: Filtering out small quakes ¶

There will probably be some very small quakes. Let’s avoid showing those on the map.

Define a constant MIN_MAGNITUDE, set it to 1.0 for now. (Later, try different values.)
Make a new column big_enough (i.e., quakes['big_enough'] = ... by comparing the mag column to MIN_MAGNITUDE.
Make a filtered_quakes variable by using the big_enough column as an index into quakes.
Edit the map code you already have to make your map use filtered_quakes instead. Make sure that it shows fewer quakes; try changing MIN_MAGNITUDE until you’re confident that it’s working.
Have your program print out the number of quakes total and after filtering. Use the len() function to get the number of rows in a data frame. (e.g., “In the last day, 150 earthquakes were detected, of which 100 had magnitude bigger than 1.5.")

Step 4: Which hemisphere has stronger quakes? ¶

Make a boolean column called northern_hemisphere that tells whether the quake occurred above the equator or not. Do this by comparing latitude with 0. Put the new column in the original quakes data frame, before you filter it.
Compute the mean mag for each value of northern_hemisphere. Do this by grouping by the northern_hemisphere column, then computing the mean of the mag column. This can be done in one line of pandas code; see the class slides for examples.

Step 5: Getting multiple days ¶

So far we’ve only been working with one day of earthquake data. Let’s expand it to a week!

Notice that the URL above contains all_day. Change that to all_week, and change the file name being loaded accordingly.
Add a date column using quakes['date'] = quakes['time'].dt.strftime("%Y-%m-%d"). (This uses a classic function to make the string format of a time, with Year, month, and day.) Consider where in your code would make most sense to do this operation. (Is there already some other line that does a related operation?)
Add animation_frame = "date" to the mapping function call to make the plot animate over days. Press the Play button on the plot (at the bottom) to see the earthquakes each day.
Notice how the color range changes day to day. Use the range_color parameter to scatter_geo to make it consistent. It’s expecting a list with two elements: [smallest_mag, biggest_mag]. Try to compute biggest_mag from the data itself.

Step 6: Filter by region ¶

You’re currently filtering quakes by magnitude. Add a second criterion to the filter: only include quakes in the northern hemisphere. Do this by computing the & (element-by-element and) of the big_enough column and the northern_hemisphere column.

Grading Rubric ¶

8 points total:

1 point for each step
2 points for good variable names, constant names, formatting and comments.