CS 214 Lab 12: Multithreading in Ruby

Since we will be dealing with multiple files, begin by creating a new subdirectory named ruby within your directory for this lab. Then copy the file array_sum.rb into your new directory. Use your favorite text editor to open this file, customize its opening documentation, and take a moment to study what it is doing and how it is doing it.

Since we have the sum of the values in each input file recorded in our spreadsheet, let's begin by running the program to make sure that it produces correct results.

In your spreadsheet, add a column labeled Ruby. Using the ruby command, verify that array_sum.rb produces the correct sum for each file in:

   /home/cs/24/adams/labs/12/numbers/

If it does, run the program three times for each file, and record the middle of the three execution times in your spreadsheet in the appropriate cells of your Ruby column. These values, computed with a single thread, will provide us with a baseline against which we can compare the speedup provided by multithreading.

Preliminaries. Before we delve into multithreading in Ruby, let's first look at some of the details of how array_sum.rb is doing what it does.

Command-Line Arguments in Ruby. Ruby programs have a built-in array constant named ARGV, that provides access to any command-line arguments. In particular:

ARGV.length provides the number of arguments specified on the command-line. Note that unlike C/C++, Ruby does not count the name of the program as one of the arguments.
ARGV[i] can be used to retrieve a String containing the command-line argument whose index is i. Ruby's array indices begin with 0.

Find where each of these are used in array_sum.rb and make certain you understand how they are being used before proceeding.

Reading the Numbers. Ruby provides dynamic arrays, which our readFile function uses to store the values it reads from the input file. Our approach is a bit unusual though: instead of reading the file line by line into an integer array, we first use Ruby's IO.readlines() method to read all of the lines from the file into a variable named strArray. Then, since the first line of the input file contains the number of values to be processed, the first item in strArray is the number of integers we need to store, so we use that information to allocate an array named intArray that is the correct size. Once we have our intArray allocated, we use a for loop to convert the remaining strings in strArray into integers, which we store in intArray. The function then lists intArray as its final value, making it the readFile function's return value.

This use of two arrays may seem crazy, but in our experiments, this approach was ten times faster than using a single array and reading the values one at a time! Ruby's IO.readlines method is apparently optimized to minimize the disk I/O, which provides a huge performance boost compared to doing the I/O line-by-line.

Note that unlike C/C++, Ruby provides garbage collection, so we need not worry about memory leaks; Ruby will reclaim the memory of these dynamic arrays for us when they are no longer needed.

Timing the Operation

To determine the execution time of function sum(), we can use Ruby's Time class, which (among other things) provides:

A Time type representing the current date and time.
The now method, which returns the current time. We use it to declare variables in which to store the times before and after we call sum().
The - operator, which lets us compute the difference of two Time values.
The to_f method that, sent to a Time object, converts it to a floating point (real) number.

Find each of these features in the program and make sure that you understand how and why they are being used before continuing.

Multithreading in Ruby. Ruby provides a Thread class, to which we can pass a block containing the instructions we want that thread to perform.

Begin by making a copy of array_sum.rb and name the copy threaded_array_sum.rb. Then open threaded_array_sum.rb and customize its opening documentation, the name of the program, etc.

Since the existing sum() function sums the entire array, let's begin by modifying it for multithreading to sum a slice of the array, based on its thread's ID and the size of its slice. For the sake of readability, let's rename it sumSlice() and modify it so that it follows the algorithm we gave in this exercise's introduction:

   ...
   def sumSlice(values, id, sliceSize)
      start = id * sliceSize
      stop = start + sliceSize - 1
      myTotal = 0
      for i in start..stop
         myTotal += values[i]
      end
      myTotal
   end
   ...

Our revised method receives the thread ID and the size of the array-slice it is to sum via its parameters. It uses them to compute its starting and stopping index values, and then uses those values to drive a for loop that accumulates its partial sum in a local variable myTotal, which it returns.

Now that we have a function to sum a slice of our array, we are ready to proceed with the other revisions to the sequential program.

Let's begin by modifying the body of procedure checkCommandLine, as shown in bold below:

   def checkCommandLine
      if ARGV.length != 2
         puts "\nUsage: sum <inFileName> <numThreads>\n\n"
         exit(1)
      end
   end

Function readFile() remains unchanged.

However, in our main function, we need to replace the call to sum() with code that will sum the array using multiple threads. Since this sounds complicated, let's do it in a new method we will call sumInParallel():

  def sumInParallel(values, numThreads)
      sliceSize = values.size / numThreads
      threadArray = Array.new( numThreads )
      result = 0
      lock = Mutex.new

      (1..numThreads-1).each do | i |
          threadArray[i] = Thread.new { 
                            myTotal = sumSlice(values, 
                                                 i,
                                                 sliceSize) 
                            lock.synchronize {
                                result += myTotal
                            }
          }
      end

      myTotal  = sumSlice(values, 0, sliceSize)

      leftovers = values.size % numThreads
      if leftovers > 0
         for i in values.size-leftovers..values.size-1
            myTotal += values[i]
         end
      end

      lock.synchronize {
         result += myTotal
      }

      for i in 1..numThreads-1
         threadArray[i].join
      end

      result
   end

This function receives from its caller the array of values to be summed, plus the number of threads to be used to sum the array. We start by computing the size of our slice, using the approach from our introduction:

   sliceSize = values.size / numThreads

Since the user is passing the desired number of threads to our program via a command-line argument, we cannot know many threads are needed. We use the standard workaround: build a dynamic array of threads:

   ...
   threadArray = Array.new( numThreads )
   ...

Then, we define our result variable that we will use to communicate our grand total back to the caller:

   result = 0

Since we define this variable before we create any threads, any threads we create within its scope will be able to share access to it, making it a shared variable among our threads.

We need some way of letting each thread communicate its partial result back to our original thread. Ruby does not have a reduction operation like OpenMP, nor does it have synchronized entry procedures like Ada. One approach might be to have each thread add the values in its slice to result one at a time:

   result += values[i]     # DON'T do this

This is a bad idea, because we would get lots of write-write conflicts, since the threads will all be trying to write to our shared result variable at the same time. Moroever, this approach would require a write to the shared variable for each entry in the array -- if our array is length N, it would require O(N) writes to the shared variable, and synchronizing all of those writes would greatly slow our function.

Instead, we will have each thread accumulate its partial sum in a local variable called myTotal. When it is finished computing the partial sum, we have each thread add that partial sum to result:

   result += myTotal     # OK, if synchronized

This approach requires just one write to the shared variable per thread, so for T threads, we will need just O(T) synchronizations. To facilitate those synchronizations, we declare a Mutex object named lock:

   lock = Mutex.new

We might visualize this approach as follows:

Once we have an array of threads, a shared result variable, a Mutex to synchronize the accesses to it, and the size of the array-slice a thread is to sum, we use an each loop to launch the threads:

   ...
      (1..numThreads-1).each do | i |
         threadArray[i] = Thread.new { 
                              myTotal = sumSlice(values, 
                                                  i,
                                                  sliceSize) 
                              lock.synchronize {
                                    result += myTotal
                              }
                          }
      end
   ...

If the user specifies that T threads are to be used, this loop will launch T-1 new threads, each using method sumSlice() to add up the values in its slice of the values array, and then storing that partial sum in myTotal. Once it has accomplished that, it must add the partial sum in myTotal to the shared result variable. To prevent write-write conflicts, these writes to result must be synchronized. Ruby provides a variety of ways to do this; here we use the synchronize block (shown above), which uses our Mutex object named lock to ensure that only one thread at a time will perform the statement(s) within its block. By wrapping each thread's write to the shared variable result in an synchronize block, we synchronize the accesses to it, and thus prevent write-write conflicts from occurring.

Note that we give the first thread ID 1, because the (original) thread that is launching these threads will receive ID 0.

With that done, our threads are now off and running, each summing its own slice of the array.

Our original thread can then sum up the values in its slice, using ID 0, and storing its partial sum in its own local variable myTotal:

   ...
   myTotal = sumSlice(values, 0, sliceSize)
   ...

With its slice summed, it then adds any left-overs to the partial sum in its myTotal variable:

      ...
      leftovers = values.size % numThreads
      if leftovers > 0
         for i in values.size-leftovers..values.size-1
            myTotal += values[i]
         end
      end
      ...

Since the number of left-over values will be less than the number of tasks, this will not take very long, provided the array is much larger than the number of threads we are using.

With the sum of thread 0's slice plus the left-overs accumulated in its myTotal variable, our original thread just has to add this value to the shared result variable. Since another thread could also be adding its partial sum to result, this write must be synchronized:

   lock.synchronize {
      result += myTotal
   }

However, in Ruby, if our original thread should happen to finish before the helper threads, all of those threads will terminate, without having finished their tasks. To prevent this, the original thread must make certain that the helper threads have finished summing their array-slices before it returns the value in result. We accomplish this using the join() method:

   ...
      for i in 1..numThreads-1
         threadArray[i].join
      end

      result
   end

A message t.join() causes the thread that sends that message (i.e., our original thread) to suspend until thread t terminates. (If the thread has already terminated, our original thread keeps going to the next repetition of the loop.) Once thread t terminates, our original thread resumes execution where it left off. The join() method is thus a simple way to synchronize two threads.

By using a for loop to iterate through our thread array, sending each thread therein the join() message, we ensure that all threads have terminated before we return result.

Note that for T threads and an array of length N, the cost of this parallel sum operation is O(N/T + T), where O(N/T) is the cost of computing the partial sums, and O(T) is the cost of adding up those partial sums. By contrast, the cost of our parallel sum in C++ using OpenMP's reduction mechanism was O(N/T + log₂(T)), since the reduction can be done in time log₂(T). So long as T is small and N is large, the size of the array will be the dominant factor and the two approaches should perform similarly. However, if we are on a computer with many cores and increase T significantly, the logarithmic reduction approach will scale better than linear approach we used here.

All that is left is a minor modification to the body of our main function:

   def main
      checkCommandLine

      values = readFile

      startTime = Time.now
      total = sumInParallel(values, Integer( ARGV[1] ) )
      endTime = Time.now

      interval = (endTime - startTime).to_f
      puts "\nThe sum is #{total}\n"
      printf(" and computing it took %.9f seconds.\n\n", interval)
   end

Note that we must convert the command-line argument (a string) to an integer before we can pass it to our sumInParallel() function's integer parameter. We use Ruby's Integer() class constructor to perform this conversion.

Testing. Save your changes and use ruby to run threaded_array_sum on the input files from the course directory. Begin with the largest input file and 4 threads:

   threaded_array_sum /home/cs/214/adams/labs/12/numbers/1000000numbers.txt 4

Do you get the correct result? If not, debug your program and continue when you do.

Performance Analysis. To see what kind of speedup we are getting, add 5 more columns to your spreadsheet, and label them 1 2 3 4 5. For each of the input files, run it three times using 1 thread, and record the median (middle) time in the appropriate row of your spreadsheet under column 1. How does the time for threaded_array_sum.rb using 1 thread compare to the time for array_sum.rb?

Repeat this procedure using 2, 3, 4, and 5 threads and record the middle times in the appropriate spreadsheet cells. In your spreadsheet, answer the following questions:

Ruby.1. How does the time using threaded_array_sum.rb and 1 thread compare to the time using array_sum.rb?

Ruby.2. How do the times using T > 1 threads compare to the times using a single thread? Explain what is happening.

Ruby.3. How do your Ruby times compare to your Ada times?

Ruby.4. How is multithreading in Ruby different from our other languages? (Hint: See this discusion from ruby-forum.com.)

In your spreadsheet, create a three dimensional bar chart that shows the performance of threaded_array_sum.rb using 1, 2, 3, 4, and 5 threads, for the array sizes 1000, 10000, 100000, and 1000000. Be sure to give your chart a descriptive title, and label all axes appropriately.

Turn In: Create a script file in which you use cat to display threaded_array_sum.rb. Then run threaded_array_sum.rb, using the file 1000000numbers.txt and 1 thread; then run threaded_array_sum.rb, using the same input file and 4 threads. Quit script and print a hard copy of this script file, plus your spreadsheet and its chart.

That concludes the Ruby portion of this lab.

Calvin > CS > 214 > Labs > 12 > Ruby

This page maintained by Joel Adams.