Lab 9: Experimenting with data plotting

Goals:

Part A: Plotting climate data

To begin, create a new folder called lab09 and create a new file in that folder called temps.py, starting from template.py as usual.

We’ll start by working with some simple real-world data that’s meaningful to many people: temperature. Of course temperature varies a lot day-to-day and place-to-place, but to keep things simple we’ll look at global average temperatures over time. Specifically, we’ll look at a measure called the “temperature anomaly”, which is defined as:

The term temperature anomaly means a departure from a reference value or long-term average. A positive anomaly indicates that the observed temperature was warmer than the reference value, while a negative anomaly indicates that the observed temperature was cooler than the reference value.

One of the boring but essential jobs of government agencies is to collect, validate, and publish data like this over long periods of time. In this case, the National Oceanic and Atmospheric Administration (NOAA) has collected global temperature data going back to 1880. They provide a dataset, called “Climate at a Glance”, that shows how much global temperatures differed from their 20th-century average, for each year from 1880 till the present. This data can be downloaded in CSV (Comma Separated Values) format from the following link:

https://www.ncdc.noaa.gov/cag/global/time-series/globe/land_ocean/1/6/1880-2025.csv?trend=true&trend_base=10&firsttrendyear=1880&lasttrendyear=2025

Note: other related datasets are also available; see https://www.ncdc.noaa.gov/cag/global/time-series. And more background information about the data is available at https://www.ncei.noaa.gov/access/monitoring/global-temperature-anomalies/.

  1. Download the above link and save the CSV file in your lab09 folder using the name 1880-2025.csv. (Note that you can save this file by right-clicking the link and selecting “Save as...”)
  2. Next, open the file using a plain text editor, spreadsheet program, or Thonny, and observe the structure of its contents. (Note the several lines of header content before the data, which is not standard—but not uncommon—for a CSV file. The actual data begins on a line that starts with Year,Anomaly.)
  3. Write a Python program to read in the data from this file. We’ll walk you through the steps below.

We’ll use the csv library to read the data this week. (In a future week we’ll use Pandas to make this really easy, but for now we’ll walk through step by step.)

The slides showed the following example code for reading a CSV file using csv.reader:

import csv
csv_data = list(csv.reader(open("data.csv")))
names = csv_data[0]
print("Column names:", names)
for row in csv_data[1:]:
  print(row)

Observe what is printed each time through the loop. Note that each iteration returns one row of the data, organized as a list in the variable row. The first data item in the row is stored in row[0] and the second data item as row[1].

The loop continues until there are no more rows left in the data file.

Note that while the data is read as strings, the dates really should be represented as integers and the temperature data should be represented as floating point numbers.

Check-in: look at your years and temp_anomalies variables. Do they contain the data you expect? Are the data types what you expect?

Next, use the years and temp_anomalies lists to make a plot of the data using the matplotlib library. Begin by importing the matplotlib library as follows:

import matplotlib.pyplot as plt

Note: keep your imports together at the beginning of your program, as usual.

Try running your program now: if it fails because matplotlib cannot be imported, you may have to add the package to Thonny by going to the menu item Tools → Manage packages ...

After this, you can use the plt object to create new plots. For example, we saw in the POGIL that we could make a line plot as follows:

t = np.arange(0.0, 2.0, .01)
volts = np.sin(2 * np.pi * t)
plt.plot(t, volts)
plt.xlabel('time (s)')
plt.ylabel('volts (mV)')
plt.show()

The first argument to the plot method is a list of x values, and the second argument is a list of y values.

Check-in: does your plot look reasonable? Can you read the axes? Does the trend (roughly) match what you see on the NOAA site? If not, check your input data! Specifically, make sure you’re using the correct data types for the values in the years and temp_anomalies lists.

Optionally, explore some of the options you can set to adjust how the plot looks. To figure out how to set line styles, change colors, add labels, add a grid, etc., consult the matplotlib docs. Additional information on plot commands are listed here. A gallery of the many nifty kinds of plots you can make with matplotlib is available here.

Notice that you can zoom in on different sections of the plot using the controls on the plot window. Save an image of your plot and call it temps.png (using the save icon in the plot window controls, or plt.savefig('temps.png')). (This command may need to go before plt.show().) Submit your temps.py and temps.png files using Moodle.

Part A Checklist

Part B: The Collatz Sequence

The Collatz conjecture (also known as the 3n + 1 conjecture) is a conjecture in mathematics named after Lothar Collatz. It deals with a sequence defined by starting with any positive integer n. Each term is derived from the previous term in the sequence as follows:

In mathematical notation, the sequence is defined as follows:

$f\left( n \right) = \{\frac{n}{2}\ \text{for even n},\ 3n + 1\ \text{for odd n}\ $

The conjecture is that no matter what value of n, the sequence will always eventually reach 1.

(Do you remember this from Lab 5? You can start with that code! When dividing by 2, use floor division (//) so you don’t end up with floating-point numbers.)

Step 1: Making a function that prints values

Test your function by calling it with various numbers and checking that the output sequence is correct. For instance, calling collatz_sequence(3) should print:

3
10
5
16
8
4
2
1

(it’s ok if your output is missing the first or last number).

Step 2: Making the function return a count instead

Step 3: Collect sequence lengths

First, let’s run the function for a range of numbers and see what the function returns.

Check-in: Do you see pairs of numbers in your program output: a start value and a count value? If you have other output, make it happen only if DEBUGging.

Now we’ll collect the data in two lists so we can plot it.

Check-in: do your lists look right? You should see alternating small and large numbers in counts_till_1.

Step 4: Plotting results

Part B Checklist