The purpose of today's lab is to review the use of classes, and to practice working with modules and files.

Our goal today will be to process a data file of university employee salary data. Each line of the file describes an employee. We will begin by modeling an employee using a class, and then write a driver program to use this model to compute some statistics about the university employees based on salary and rank.

Exercise 9.1

As usual, begin by creating a folder called lab09 . Create a class in this folder to model an employee as follows:

  1. Name the file employee.py
  2. Define a class named Employee
  3. Create an __init__ method with no parameters except for self . In the body of the method initialize the following instance variables to values of your choice:
    • self._first
    • self._last
    • self._rank
    • self._salary
    Choose reasonable/interesting defaults for each of these values, but make sure that the salary you choose is greater than $20000.
    Do the assignment to default values in the body of the method -- just pick whatever values you want!
  4. Create a method to return a string representation in the following format: Last name, first initial: rank ($salary)
    For example, Jones, B.: associate ($23000.34)

With our model of an employee created, we'd like to test our implementation. Do this by adding a test section to the bottom of your file that checks if the file is being run as a script. Remember, you achieve this behaviour using the following check:

    if __name__ == '__main__':
and then putting your tests in the body of the if statement. Add code to create a default employee and verify that the employee can be printed as required.

We now have a model of an employee, but we'd like to add some additional functionality. In particular, as we consider our problem specification, we note that we will need access to the rank and salary information of each employee.

Exercise 9.2

Add accessors for the rank and salary of an employee, as well as appropriate tests in the testing section of your file.

The last form of functionality we will need involves creating an employee from information that is originally contained in a file. There are a two different possible approaches:

Though there are arguments that could be made for either approach, we will choose here to modify the class to deal with the details related to creating an Employee instance from a single string (i.e., the latter approach). Note that more advanced Python code typically uses the pickle utility to read/write classes from/to files; we do not introduce that utility here.

Exercise 9.3

Update the __init__ method to receive an optional line parameter in addition to self. Do this as follows:

  1. Add a parameter named line and give it the default value '' (i.e., the empty string)
  2. Within the body of the method, check if the line is the empty string using the following algorithm:
    1. If the line is the empty string:
      1. Give the instance variables default values (You already have code for this from Exercise 1.)
    2. Else (the line is not empty and we assume the line has the format "first last rank salary"):
      1. Split the line based on whitespace, storing the result in a local variable named strings
      2. Use each element of strings to initialize an instance variable of the employee. Make sure you store the salary as an int or float -- not as a string.
        Don't forget to verify that the salary satisfies the invariant that the salary of an employee must be greater than or equal to 20000! If the line does not represent a valid employee, print a suitable message to stderr and then crash the program. Check your code from the previous lab if you do not remember how to output to stderr.
  3. Add test cases to your testing code to verify that your updated constructor works as desired.

With a functional Employee class ready to go, we can turn to the analysis of the employee data file.

Do this...

Download the employee data file from here: code/employees.txt. Save this file in your lab09 folder. Make sure you call the file employees.txt.

Analyzing the data file is not something that should be the responsibility of a single employee. Instead, this task belongs in a separate space from the definition of the employee, that is, in a different file.

Exercise 9.4

Create a new python file in your folder called driver.py . Implement the following algorithm:

  1. Use import to gain access to the Employee class.
  2. Create an empty list named employees
  3. Use a with statement to open the data file and automatically close it when we are finished processing as follows:
    1. Loop through each line in the file
      1. Append an employee object to the list of employees using the information from the current line. (How do you create an employee object? You have to call the Employee constructor.)
  4. Print the length of the list of employees and verify that there are as many entries as there are lines in the data file (which is 100).

With the data successfully imported in a format that we can use, we are ready to do our processing. Here are the statistics we would like to compute:

How are we going to compute the average salaries, by rank -- i.e., the average salary of all Managers and average salary of all Staff, etc.? To do this, we need to compute the sum of all the salaries of each rank ("Manager" or "Staff", etc.) and count how many employees of each rank we have (i.e., how many "Manager"s we have). To do this, we'll use two dictionaries, one mapping rank to sum-of-salaries and one mapping rank to number-of-employees-of-that-rank. So, e.g., when the dictionaries are full populated they might look like this:

Totals Dictionary Count Dictionary
"Manager" → 10101010 "Manager" → 9
"Staff" → 9939339 "Staff"→ 53
"CEO" → 100010 "CEO" → 1

If you don't remember how to access, add entries, and update entries in a dictionary, you probably want to review that by pulling up the textbook chapter that covered dictionaries.

We could write separate methods to compute each of these statistics, but then we would end up reading through the (possibly very large) list of employees multiple times. Instead, we will do the processing directly, reading through the file a single time, and then writing the results to a file. Doing all of these steps at once would be a bit much, so let's start by just getting the information we will need.

Exercise 9.5

Use the following algorithm to gather the information we need to compute the 3 statistics listed above:

  1. If there are no employees, print an appropriate message
  2. If there is at least one employee:
    1. Create empty dictionaries called totals and counts
    2. Set max_employee equal to the first employee in the list of employees
    3. Set min_employee equal to the first employee in the list of employees.
    4. For each employee in the list of employees:
      1. If the employee rank is already a key in the totals or counts dictionaries. Note: You can use if emp.get_rank() in totals for this check.
        1. Increment the value for that rank by the current employee salary in the totals dictionary.
        2. Increment the count for that rank by 1 in the counts dictionary.
      2. Otherwise: (i.e., the rank is not a key):
        1. Add a key value pair in the for the rank and the current employee salary in the totals dictionary.
        2. Add a key value pair for the rank and 1 in the counts dictionary.
      3. Compare the salary of the current employee to the salary of the max_employee and update max_employee if appropriate.
      4. Compare the salary of the current employee to the salary of the min_employee and update min_employee if appropriate.

We have now gathered in the information we need to compute our statistics, and are ready to write the relevant information to a new file.

Exercise 9.6

Writing the employee information for the employees with the maximum and minimum salaries is (relatively) straightforward, but printing a table indicating the average salary by rank is slightly more complicated. Create a function called print_averages that implements the following algorithm.

  1. Receive a dictionary of total salaries by rank, a dictionary of the number of employees by rank, and an open file handle called outFile. The assumption of this function is that the keys of the totals dictionary match the keys of the counts dictionary and that the file handle was opened in write mode by the caller .
  2. Write a table heading to the file heading using outFile.write('Rank\tAverage Salary\n')
  3. For each rank in totals:
    1. Compute the average salary for the rank (using both the totals and counts dictionaries)
    2. Write the rank to the file, followed by a tab, followed by the average salary and a newline

Do not close the file handle, as we want to leave the handle in the same state it was passed to the function.

Finally, back in your main code section, add statements at the end that implement the following algorithm:

  1. Open an output file called employee_stats.txt in write mode
  2. Write descriptive information about the employee earning the largest salary to the file. Remember, write requires a string argument, but you can use str(max_employee) to create a string representation of the employee with the maximum salary.
  3. Write descriptive information about the employee earning the smallest salary to the file.
  4. Call the print_averages function to print the table of averages by rank.
  5. Close the file (if you didn't open the file within a with statement).

Checking In

Submit all the code and supporting files for the exercises in this lab. We will grade this exercise according to the following criteria:

If you’re working on a lab computer, don’t forget to log off of your machine when you are finished!