Lab 8: Experimenting with data plotting

Goals:

Part A: Plotting climate data

To begin, create a new folder called lab08 and create a new file in that folder called temps.py, starting from template.py as usual.

There are many data sets freely available on the web. One of those data sets is the NOAA Climate at a Glance dataset. It comes in a CSV (Comma Separated Values) format and shows the year and the variation of temperature from the average 20th century temperature between the years 1901-present. The available data can be viewed here:

https://www.ncdc.noaa.gov/cag/global/time-series

A CSV file of global temperature variations from 1880 till the present can be downloaded here:

https://www.ncdc.noaa.gov/cag/global/time-series/globe/land_ocean/1/6/1880-2021.csv?trend=true&trend_base=10&firsttrendyear=1880&lasttrendyear=2021

Download the above link and save the CSV file in your lab08 folder using the name 1880-2021.csv. (Note that you can save this file by opening it in your browser and then right-clicking within the webpage and selecting “Save as...") Next, open the file using a plain text editor, spreadsheet program, or Thonny, and observe the structure of its contents (note the 5 lines of header content before the data). Write a python program to read in the data from this file. To parse the CSV data, use the csv library. To use the library and open the file, include the following code:

import csv
# Open and read the climate data file
f = open('1880-2021.csv','r')
reader = csv.reader(f)

To skip the header lines in the file, call the following function:

next(reader)

(this will need to be done for each header line you want to skip). Next, you can read the comma separated data using a for loop as follows:

for row in reader:
    print(row)

Observe what is printed each time through the loop. Note that each iteration returns one row of the data with the commas removed and organized as a list in the variable row. The first data item in the row is stored in row[0] and the second data item as row[1].

The loop continues until there are no more rows left in the data file.

Modify the code in the loop body by removing the print statement and building two new lists: xdata and ydata which should contain the list of x-coordinates (years) and y-coordinates (temperature variation) respectively. Each time through the loop, a new data item should be added to the xdata and ydata lists.

Note that while the data is read as strings, the dates are really integers and the temperature data are floating point numbers.

Check-in: look at your xdata and ydata variables. Do they contain the data you expect? Are the data types what you expect?

Next, use the xdata and ydata lists to make a pretty plot of the data using the matplotlib library. Begin by importing the matplotlib library as follows:

import matplotlib.pyplot as plt

Note: keep your imports together at the beginning of your program, as usual.

Try running your program now: if it fails because matplotlib cannot be imported, you may have to add the package to Thonny by going to the menu item Tools → Manage packages ...

After this, you can use the plt object to create new plots. For example, a line plot can be made as follows:

plt.plot([0,1,2,3,4,5], [0,10,5,15,10,20])

plt.show()

The first argument to the plot method is a list of x values, and the second argument is a list of y values. Next, try plotting the climate data file using the xdata and ydata lists read from the CSV file. Plot a graph of the temperature data.

Check-in: does your plot look reasonable? Can you read the axes? Does the trend (roughly) match what you see on the NOAA site? If not, check your input data!

Now, label the x and y axes and give the graph a suitable title. Do what you can to make the plot look pretty! To figure out how to set line styles, change colors, add labels, add a grid, consult the matplotlib docs at:

https://matplotlib.org/users/pyplot_tutorial.html

Additional information on plot commands are listed here:

https://matplotlib.org/api/pyplot_summary.html

A gallery of the many nifty kinds of plots you can make with matplotlib is given here:

https://matplotlib.org/gallery.html

Notice that you can zoom in on different sections of the plot using the controls on the plot window. Save an image of your plot and call it temps.png (using the save icon in the plot window controls). Submit your temps.py and temps.png files using Moodle.

Grading Rubric: 5 points total:

Part B: The Collatz Sequence

The Collatz conjecture (also known as the 3n + 1 conjecture) is a conjecture in mathematics named after Lothar Collatz. It deals with a sequence defined by starting with any positive integer n. Each term is derived from the previous term in the sequence as follows:

In mathematical notation, the sequence is defined as follows:

$f\left( n \right) = \{\frac{n}{2}\ \text{for n even},\ 3n + 1\ \text{for n odd}\ $

The conjecture is that no matter what value of n, the sequence will always eventually reach 1.

(Do you remember this from Lab 5? You can start with that code!)

To begin, create a new file called collatz.py using Thonny.

Step 1: Making a function

Write a function named collatz_sequence which takes an argument n, and then loops through the Collatz sequence printing the sequence of numbers until it reaches 1. Test your function by calling it with various numbers and checking that the output sequence is correct. For instance, calling collatz_sequence(3) should return:

3
10
5
16
8
4
2
1

Step 2: Modifying the function

Create a local variable to keep track of the number of loop iterations in the function. It would make sense to call it count. It will need to be initialized to 0 before you begin the loop. Inside the loop, increment count by 1 to keep track of the number of iterations. When the loop terminates (we get to 1), have the function return the value of count. Test your code.

Step 3: Collect sequence lengths

In your main program, create a simple for loop using a loop variable called start that provides values from 1 up to 50. Call the collatz_sequence function once for each value of start and print the values for start and the return value from the function call (which corresponds to the number of iterations in the Collatz sequence for that number).

Check-in: Do you see pairs of numbers in your program output: a start value and a count value? If you have other output, make it happen only if DEBUGging.

Step 4: Plotting results

Store the values from your loop in two lists:

Check-in: do your lists look right? You should see alternating small and large numbers in ydata.

Use the matplotlib library, create a scatter plot of xdata and ydata. Import the matplotlib library as follows:

import matplotlib.pyplot as plt

After this, you can use the plt object to create new plots. For example, scatter plots can be made as follows:

plt.scatter(xdata, ydata)
plt.show()

Include x and y axis label and a title for the plot. Turn on a grid in the plot as well and set the dot size (s) to a small number.

Change the loop so that it iterates from 1 to 5000. The scatter plot provides an interesting visual that allows you to see the relative number of iterations for each value.

Save an image of your plot and call it collatz.png. Submit your collatz.py and collatz.png files using Moodle.

Grading Rubric: 7 points total: