Warning: This is draft content. Do not start work on this assignment yet.
For this project, your program will read in a data file containing data from a run of the biochem department's MALDI-TOF device. The device shoots lasers at a sample and measures the time-of-flight of ionized molecules. In essence, the result is a kind of mass spectrometry.
Your program will put the MALDI-TOF data into two different lists and
will use a package called matplotlib to graph the values.
The file is a CSV file. The first few lines look like:
time,intensity
2668.909,12
2669.102,9
2669.296,5
2669.490,3
2669.684,2
We will use time as the x value.
It gives a time when the sensors in the MALDI-TOF device got a reading.
The y-value is the intensity of the reading.
Your program must read this file into 2 lists -- a list of the x-values
and a list of the y-values -- as required by matplotlib.
The data file malditof_data.csv is here. Download
that file, as described below.
Step 1. Set up ¶
In Thonny, create a new folder called malditof. Then, create a new
file malditof.py.
Download the malditof_data.csv file (if you haven't already) and put
the file in this new folder. Open the file and inspect it a little bit
to get a feel for what it looks like.
We’re going to supply the name of the file to use as a command-line argument.
Enable “Program Arguments” on the “View” menu in Thonny to show the text box, then put malditof_data.csv in that text box.
Step 2. Set up the main file's areas ¶
Create the areas of the file where you will put imports, CONSTANTS, function declarations, and the main code. Do it similar to how we've done this in previous weeks.
To access the command-line argument:
- In the imports section,
import sys. - In the main code, use
filename = sys.argv[1]
Step 3. Write the code ¶
The code should be fairly similar to your code where you plotting climate data. You will need to import the matplotlib package like this:
import matplotlib.pyplot as plt
Write the code to display the malditof data using a line graph. To do
this, read the values from the file into two lists, time_data and intensity_data.
time_data holds all the first values from the lines of the data file.
intensity_data holds all the second values from the lines of the data file.
Then, call:
plt.plot(time_data, intensity_data, color='red', marker='.', linestyle='-')
Note that when you run your program it may take a while to read in the thousands and thousands of data points and plot them. Be patient.
Remember to show your plot.
Step 4: Adding a Smoothed Plot ¶
When you look closely at the plot of the data, you'll see the data shows 6 distinct peaks. Or, does it? Zoom in closely at the largest peak, and you will see the peak is not just one value, but a set of values that jump up and down a bit at the top. Let’s smooth that out.

You need to add code to your program that will put a second line on the same plot that shows smoothed values of the data in blue. For each point i in the graph, you will display the average of the y-values from i-5 to i+5 -- in other words, the 11 surrounding y-values. You'll do this to try to "smooth out" the values so that you can better find real individual peaks.
Note: this sounds easy, doesn't it? But, there is at least one
gotcha involved. What do you do when processing data at the beginning
and end where there is no i-5 or i+5 intensity_data value? For this
exercise, we’ll only smooth the values that exist: for the point at index 0,
we’ll average points 0 through 5; for point 1, we’ll average points 0 to 6, etc.
To do this, it’ll be helpful to make a function that computes the average of a list of numbers:
- Name
compute_average(x)- Purpose
- Computes the average of a list of numbers.
- Parameters
x: a list of floating-point numbers- Return Value
- the average of that list (a floating-point number)
- Example
compute_average([1.0, 2.0, 3.0])returns2.0
Hint 1: Create a second list called smoothed_intensity_data, and fill it by
averaging the data from intensity_data. Then, your second plot is xdata vs.
smoothed_intensity_data.
Hint 2: To get both plots on the same screen, call plt.plot() twice,
and then do plt.show() afterward.
Hint 3: The easiest way to do this is to do an index-based loop over
intensity_data. In the loop, compute the beginning index and ending index for
the values to average together to get the smoothed value. The beginning
index is i-5 and the ending index is i+6 if the point is in the
middle of the list. (Note that this includes 5 points on either side.) If the point is near the beginning of the list, the
beginning index is 0. if the point is near the end of the list, the
ending index is len(intensity_data). (hint: max(idx, limit) and min(idx, limit).) Once you have the indices computed, you
can use a slice to get the sub-list that you will average.
Hint 4: Although you might be able to find code on the Internet or in numpy
to accomplish this task, for this task try to actually write it yourself in plain Python.
Test your smoothing code by trying it on a few specially crafted lists.
Execute your tests only if DEBUGging is enabled; disable it before you submit.
Grading Rubric:
| Category | Max points |
|---|---|
| Program produces correct plot of data | 4 |
| Program produces correct 2nd plot of smoothed data | 4 |
| Good variable names and comments | 2 |
| TOTAL | 10 |
Submit to Moodle only malditof.py. Do not submit the data file.
Extension ¶
Can you do this exercise without using csv.reader()?
In particular, this exercise originally had you work with a space-separated file and .split() the lines yourself. Try downloading the original space-separated file and see if you can adapt your code to work with it.
Note that it doesn’t have a header row.
Acknowledgments ¶
This exercise is based on an exercise developed by Prof Vic Norman.