Goals:
-
To practice using lists and loops in Python
-
To practice reading CSV files and converting data types
-
To practice plotting data using
matplotlib
Part A: Plotting climate data ¶
To begin, create a new folder called lab08 and create a new file in that
folder called temps.py, starting from template.py as usual.
There are many data sets freely available on the web. One of those data sets is the NOAA Climate at a Glance dataset. It comes in a CSV (Comma Separated Values) format and shows the year and the variation of temperature from the average 20th century temperature between the years 1901-present. The available data can be viewed here:
https://www.ncdc.noaa.gov/cag/global/time-series
A CSV file of global temperature variations from 1880 till the present can be downloaded here:
Download the above link and save the CSV file in your lab08 folder using
the name 1880-2021.csv. (Note that you can save this file by opening it
in your browser and then right-clicking within the webpage and selecting
“Save as...") Next, open the file using a plain text editor,
spreadsheet program, or Thonny, and observe the structure of its
contents (note the 5 lines of header content before the data). Write a
python program to read in the data from this file. To parse the CSV
data, use the csv library. To use the library and open the file,
include the following code:
import csv
# Open and read the climate data file
f = open('1880-2021.csv','r')
reader = csv.reader(f)
To skip the header lines in the file, call the following function:
next(reader)
(this will need to be done for each header line you want to skip). Next,
you can read the comma separated data using a for loop as follows:
for row in reader:
print(row)
Observe what is printed each time through the loop. Note that each
iteration returns one row of the data with the commas removed and
organized as a list in the variable row. The first data item in the row is
stored in row[0] and the second data item as row[1].
The loop continues until there are no more rows left in the data file.
Modify the code in the loop body by removing the print statement and building two new lists: xdata and ydata which should contain the list of x-coordinates (years) and y-coordinates (temperature variation) respectively. Each time through the loop, a new data item should be added to the xdata and ydata lists.
Note that while the data is read as strings, the dates are really integers and the temperature data are floating point numbers.
Check-in: look at your
xdataandydatavariables. Do they contain the data you expect? Are the data types what you expect?
Next, use the xdata and ydata lists to make a pretty plot of the data using the matplotlib library. Begin by importing the matplotlib library as follows:
import matplotlib.pyplot as plt
Note: keep your imports together at the beginning of your program, as usual.
Try running your program now: if it fails because matplotlib cannot be imported, you may have to add the package to Thonny by going to the menu item Tools → Manage packages ...
After this, you can use the plt object to create new plots. For example, a line plot can be made as follows:
plt.plot([0,1,2,3,4,5], [0,10,5,15,10,20])
plt.show()
The first argument to the plot method is a list of x values, and the second argument is a list of y values. Next, try plotting the climate data file using the xdata and ydata lists read from the CSV file. Plot a graph of the temperature data.
Check-in: does your plot look reasonable? Can you read the axes? Does the trend (roughly) match what you see on the NOAA site? If not, check your input data!
Now, label the x and y axes and give the graph a suitable title. Do what you can to make the plot look pretty! To figure out how to set line styles, change colors, add labels, add a grid, consult the matplotlib docs at:
https://matplotlib.org/users/pyplot_tutorial.html
Additional information on plot commands are listed here:
https://matplotlib.org/api/pyplot_summary.html
A gallery of the many nifty kinds of plots you can make with matplotlib is given here:
https://matplotlib.org/gallery.html
Notice that you can zoom in on different sections of the plot using the controls on the plot window. Save an image of your plot and call it temps.png (using the save icon in the plot window controls). Submit your temps.py and temps.png files using Moodle.
Grading Rubric: 5 points total: ¶
-
3 points for code correctness and completeness
-
1 points for good variable names, constant names and formatting and comments
-
1 point for submitting a pretty plot with the correct name
Part B: The Collatz Sequence ¶
The Collatz conjecture (also known as the 3n + 1 conjecture) is a conjecture in mathematics named after Lothar Collatz. It deals with a sequence defined by starting with any positive integer n. Each term is derived from the previous term in the sequence as follows:
-
if the previous term is even, the next term is one half the previous term.
-
otherwise, the next term is 3 times the previous term plus 1.
In mathematical notation, the sequence is defined as follows:
$f\left( n \right) = \{\frac{n}{2}\ \text{for n even},\ 3n + 1\ \text{for n odd}\ $
The conjecture is that no matter what value of n, the sequence will always eventually reach 1.
(Do you remember this from Lab 5? You can start with that code!)
To begin, create a new file called collatz.py using Thonny.
Step 1: Making a function
Write a function named collatz_sequence which takes an argument n,
and then loops through the Collatz sequence printing the sequence of
numbers until it reaches 1. Test your function by calling it with
various numbers and checking that the output sequence is correct. For
instance, calling collatz_sequence(3) should return:
3
10
5
16
8
4
2
1
Step 2: Modifying the function
Create a local variable to keep track of the number of loop iterations
in the function. It would make sense to call it count. It will need to
be initialized to 0 before you begin the loop. Inside the loop,
increment count by 1 to keep track of the number of iterations. When the
loop terminates (we get to 1), have the function return the value of
count. Test your code.
Step 3: Collect sequence lengths
In your main program, create a simple for loop using a loop variable
called start that provides values from 1 up to 50. Call the
collatz_sequence function once for each value of start and print the
values for start and the return value from the function call (which
corresponds to the number of iterations in the Collatz sequence for that
number).
Check-in: Do you see pairs of numbers in your program output: a
startvalue and acountvalue? If you have other output, make it happen onlyif DEBUGging.
Step 4: Plotting results
Store the values from your loop in two lists:
-
one list called
xdatawhich contains a list of the values of n passed to the function -
one list called
ydatawhich contains a list of the return values (the count)
Check-in: do your lists look right? You should see alternating small and large numbers in
ydata.
Use the matplotlib library, create a scatter plot of xdata and
ydata. Import the matplotlib library as follows:
import matplotlib.pyplot as plt
After this, you can use the plt object to create new plots. For example, scatter plots can be made as follows:
plt.scatter(xdata, ydata)
plt.show()
Include x and y axis label and a title for the plot. Turn on a grid in
the plot as well and set the dot size (s) to a small number.
Change the loop so that it iterates from 1 to 5000. The scatter plot provides an interesting visual that allows you to see the relative number of iterations for each value.
Save an image of your plot and call it collatz.png. Submit your collatz.py and collatz.png files using Moodle.
Grading Rubric: 7 points total: ¶
-
4 points for code correctness and completeness and proper function
-
2 points for good variable names, constant names and formatting and comments
-
1 point for submitting a pretty plot