CS 214 Lab 12: Ada Tasks

This exercise involves multiple files, so begin by creating a new subdirectory named ada within your directory for this lab. Then copy the files and array_sum.adb and Makefile from the course /home/cs/214/labs/12/ada directory into your new directory. Use your favorite text editor to open these files, customize the opening documentation, and take a moment to study what each is doing.

Let's begin by compiling and running this Ada program. Either using make or gnat_make, compile array_sum.adb and verify that it builds correctly.

In your spreadsheet, label the next empty column Ada. Then run the Ada array_sum program, using each of the input files in

   /home/cs/24/labs/12/numbers/

Verify that array_sum produces the correct sum for each file. If it does, run the program three times for each file, and record the middle of the three execution times in your spreadsheet in the Ada column. These timing values, computed with a single execution thread, will provide us with a baseline against which we can compare the speedup provided by multithreading.

Preliminaries. Before we delve into multithreading in Ada, let's first look at some of the details of how array_sum.adb is doing what it does.

Command-Line Arguments in Ada. The package Ada.Command_Line provides a mechanism for processing command-line arguments in Ada. In particular:

Ada.Command_Line.Argument_Count contains the number of arguments specified on the command-line. Note that unlike the C family, Ada does not count the name of the program as one of the arguments.
Ada.Command_Line.Argument(i) can be used to retrieve a String containing the argument whose index is i. By default, Ada's array indices begin with 1.

Find where each of these are used in array_sum.adb and make certain you understand how they are being used before proceeding.

Reading the Numbers. Since Ada does not have classes, the Read_File() procedure reads the first number from the file (the number of other numbers), uses that number to allocate a dynamic array and then fills it with the values from the input file. Note that Ada provides garbage collection, so we need not worry about memory leaks; Ada will reclaim dynamically allocated memory if we do not.

To store the values, we first declare a type Int_Array with unspecified indices, to represent arbitrary-sized arrays. Then we declare an Int_Array_Pointer type that can store the address of a dynamically allocated array. Our Read_File() procedure uses these types to allocate the correctly-sized array at run-time using the information from the first line of the input file.

Note carefully how Read_File() allocates the array and passes it back to its caller.

Timing the Operation

To determine the execution time of function Sum(), we use the Ada.Real_Time package, which (among other things) provides:

The type Time, which returns the current date and time. We use it to declare variables in which to store the times before and after we call Sum().
The Clock function, which returns the current Time.
The - operator, which lets us compute the difference of two Time values.
The To_Duration function that, given a Time, converts it to a Duration (i.e., a single number).

Find each of these features in the program and make sure that you understand how they are being used before continuing.

Multitasking in Ada. Ada's parallel computing mechanism predates multithreading; the term thread was not in common use when Ada was created. As a result, Ada uses the term task instead of thread. Tasks are a built-in part of the original Ada language, so no special libraries, packages, or compiler switches are needed to use them.

Begin by making a copy of array_sum.adb and name the copy tasked_array_sum.adb. Then open tasked_array_sum.adb and customize its opening documentation, the name of the program, etc.

Each task must sum the values in its own slice, so let's begin by modifying the Sum() function so that it just adds up the values in a slice. For the sake of clarity and readability, let's rename it Sum_Slice() and revise it as follows:

   function Sum_Slice(V: in Int_Array_Pointer;
                        Start, Stop: in Integer) 
                        return Long_Integer is
      Partial_Sum: Long_Integer := 0;
   begin
         for i in Start..Stop loop
            Partial_Sum := Partial_Sum + V(i);
         end loop;
         return Partial_Sum;
   end Sum_Slice;

With these changes, our function now sums the slice of our array delimited by the parameters Start and Stop.

Now that we have a function our tasks can use to sum their slices, our next problem is to define those tasks.

An Ada task is an object with its own thread of execution. As an object, it needs a type, so our first challenge is to declare such a task-type. Add the bold lines below to tasked_array_sum.adb:

   ...
   Time_Required: Duration;

   task type Array_Adder_Task is 
      entry sum(ID, Slice_Size: in Integer);
      entry report(Result: out Long_Integer);
   end;

   procedure Check_Command_Line is
   ...

This declaration creates a new type named Array_Adder_Task that we will subsequently use to declare task objects. The line:

      entry sum(ID, Slice_Size: in Integer);

declares sum() as an entry procedure -- a procedure that another task/thread can invoke to enter into communication with this task. As we shall see, entry procedures have some special properties that let us avoid the read-write and write-write conflicts that produce race conditions. Note that procedure sum() has two in parameters, corresponding to the two pieces of information it needs in order to compute the indices of the array-slice it is to sum.

Likewise, the line:

      entry report(Result: out Long_Integer);

declares an entry procedure that another thread/task can invoke to retrieve the partial sum computed by this task, via its out parameter named Result. Our task's entry procedures thus correspond to the two broad behaviors we described in the introduction:

Sum up a particular slice of the array; and
Communicate its partial sum back to the original thread.

The code above declares the task; we must still define its behavior. To do so, insert the task body shown in bold below into your program:

   ...
   task type Array_Adder_Task is 
      entry sum(ID, Slice_Size: in Integer);
      entry report(Result: out Long_Integer);
   end;

   task body Array_Adder_Task is
      My_Total : Long_Integer := 0;
      My_Start_Index, My_Stop_Index : Integer;
   begin
         accept sum(ID, Slice_Size: in Integer) do
            My_Start_Index := ID * Slice_Size;
            My_Stop_Index := My_Start_Index + Slice_Size - 1;
         end; 

         My_Total := Sum_Slice(Values, 
                                My_Start_Index,
                                My_Stop_Index);

         accept report(Result: out Long_Integer) do
            Result := My_Total;
         end; 
   end Array_Adder_Task;

   procedure Check_Command_Line is
   ...

When execution reaches the declaration of an Ada task object, that task begins running and the task body (following the begin keyword) specifies its behavior.

The first thing our tasks does is accept a call to its sum() entry procedure:

         accept sum(ID, Slice_Size: in Integer) do
            My_Start_Index := ID * Slice_Size;
            My_Stop_Index := My_Start_Index + Slice_Size - 1;
         end;

This is a synchronizing statement, meaning

If another thread t invokes sum() before this thread accepts sum(), the other thread t suspends until this thread accepts the call.
If this thread accepts sum() before another thread t has invoked sum(), this thread suspends until some other thread does so.

When it calls sum(), the other thread t passes (via the entry procedure's parameters) the information this task needs to compute its starting and stopping index values:

            My_Start_Index := ID * Slice_Size;
            My_Stop_Index := My_Start_Index + Slice_Size - 1;

Once our task has computed its starting and stopping indices, it leaves the entry procedure sum(), freeing the thread t that invoked sum() to continue on its way.

Our task then uses the Sum_Slice function we defined previously, passing its starting and stopping indices to sum its slice of the array's values:

         My_Total := Sum_Slice(Values, 
                                My_Start_Index,
                                My_Stop_Index);

Since My_Total is a local variable within each task, we avoid the situation where multiple tasks are trying to write to the same memory location, preventing any write-write conflicts from occurring.

Once it has stored its partial sum in the variable My_Total, it accepts a call to the report() entry procedure:

         accept report(Result: out Long_Integer) do
            Result := My_Total;
         end;

This assigns the partial sum in My_Total to the procedure's Result parameter. Since Result is an out parameter, this effectively passes that partial sum in My_Total back to the calling thread t via whatever variable it passed as an argument.

As before, report() is a synchronizing statement: a thread that invokes report() before this task has accepted that call will be suspended until this task accepts a call to report(); and if this task accepts report() before another thread invokes report(), this task will be suspended until another thread invokes it. Since our task only accepts calls to report() after it has finished computing its partial sum, this prevents read-write conflicts from occurring.

Once a task has computed its partial sum and reported its result, its job is done, so that completes our definition of its behavior.

Using Tasks. Now that we have a task-type defined, our next step is to fix the sequential program so as to make use of this type.

Let's begin by modifying the body of procedure Check_Command_Line, as shown in bold below:

   procedure Check_Command_Line is
   begin
      if Ada.Command_Line.Argument_Count /= 2 then
         New_line;
         Put("Usage: array_sum <file> <numTasks>");
         New_line; New_line;
         Ada.Command_Line.Set_Exit_Status(1);
      end if; 
   end Check_Command_Line;

Procedures readFile() and Display() remain unchanged.

Our main remaining challenge is to define the function that will launch the tasks, sum the first slice of the array, sum any leftovers, and get the partial results from the tasks. Let's call this function Sum_In_Parallel(), as shown below:

  function Sum_In_Parallel(Values: in Int_Array_Pointer;
                            Num_Tasks: in Integer) return Long_Integer is
      Partial_Result, Result: Long_Integer := 0;
      Leftovers, Slice_Size, Start, Stop : Integer;

      type Task_Array is array(Integer range <>) of Array_Adder_Task;
      type Task_Array_Ptr is access Task_Array;
      Adder_Tasks : Task_Array_Ptr;

   begin
      Slice_Size := Values'Length / Num_Tasks;
      Adder_Tasks := new Task_Array(1..Num_Tasks-1);

      for task_ID in 1..NUM_TASKS-1 loop
         Adder_Tasks(task_ID).sum(task_ID, Slice_Size);
      end loop;

      Result := Sum_Slice(Values, 0, Slice_Size-1);

      Leftovers := Values'Length REM Num_Tasks;

      if ( Leftovers > 0 ) then
         Start := Values'Length - Leftovers;
         Stop := Values'Length - 1;
         for i in Start..Stop loop
            Result := Result + Values(i);
         end loop;
      end if;

      for task_ID in 1..Num_Tasks-1 loop
         Adder_Tasks(task_ID).report(Partial_Result);
         Result := Result + Partial_Result;
      end loop;

      return Result;
   end Sum_In_Parallel;

The first few lines declare the variables this function will need. Then come three lines that we need to launch our tasks. Since we cannot know until run-time how many tasks the user wants to use, we create a dynamic array of task objects. If we name our handle for this dynamic array Adder_Tasks, we might visualize what we want to happen as follows:

To accomplish this, we first create the type for this dynamic array:

     type Task_Array is array(Integer range <>) of Array_Adder_Task;

Then we create a pointer-type we can use to declare a handle for the dynamic array.

      type Task_Array_Ptr is access Task_Array;

Finally, we use this type to declare the pointer variable we will use as the handle for this dynamic array.

      Adder_Tasks : Task_Array_Ptr;

In the body of our function, the first thing we do is use the Num_Tasks parameter value to compute the size of the array-slice each task is to sum:

   ...
   begin
      Slice_Size := Values'Length / Num_Tasks;
      ...

Note that this uses integer division, which yields an integer result. Next, we create our array of task objects:

      ...
      Adder_Tasks := new Task_Array(1..Num_Tasks-1);
      ...

As soon as this array is created, the tasks it contains begin running.

However, the first thing in each task's body is the accept sum() statement. Since no thread has invoked sum() yet, this causes the tasks to wait until this happens, as described previously.

Our next step is to launch our tasks, passing them the information they need so that they can get going:

     ...
     for task_ID in 1..Num_Tasks-1 loop
         Adder_Tasks(task_ID).sum(task_ID, Slice_Size);
     end loop;
     ...

This iterates through our task array, passing each task in the array its ID and the size of the slice it is to compute. Note that we give the first task the ID number 1, because the thread performing these statements (the original thread) will have ID number 0.

With that done, our helper tasks are now off and running, each summing its own slice of the array. Our original thread can then sum up the values in its slice, storing its partial sum in its Result variable:

      ...
      Result := Sum_Slice(Values, 0, Slice_Size-1);
      ...

Then it sums up any left-overs and adds them to Result:

      ...
      Leftovers := Values'Length REM Num_Tasks; 

      if ( Leftovers > 0 ) then
         Start := Values'Length - Leftovers;
         Stop := Values'Length - 1;
         for i in Start..Stop loop
            Result := Result + Values(i);
         end loop;
      end if;
      ...

Our original thread thus uses a for loop to add the leftover values to its Result value.

Note finally that Ada uses the REM operator for remaindering; it provides a different MOD operator for the modulus operation. (The two operations produce different results for negative operands.)

With the sum of its slice plus the left-overs in Result, our original thread just has to retrieve and add in the partial results from our tasks, as shown in steps 7 and 8 of our diagram (above). Since our tasks are in an array, we just need to iterate through that array and ask each of task to report its partial sum:

      ...
      for task_ID in 1..Num_Tasks-1 loop
         Adder_Tasks(task_ID).report(Partial_Result);
         Result := Result + Partial_Result;
      end loop;
      return Result;
   end Sum_In_Parallel;

We retrieve each task's partial sum via the argument Partial_Result, and then add that to our Result variable. When our function terminates, Result contains the sum of the entire array!

Note that for T tasks and an array of length N, the cost of this parallel sum is O(N/T + T), where O(N/T) is the cost of computing the partial sums, and O(T) is the cost of adding up those partial sums. So long as N is large and T is small, the size of the array will be the dominant factor.

Note also that because:

our Ada tasks accumulate their partial sums in local variables,
our original thread then
1. retrieves each task's partial sum using its (synchronized) report() entry procedure, and
2. then adds that partial sum to its Result variable,

this approach avoids any write-write conflicts, since we never have multiple threads trying to write to the same memory location.

It also avoids any read-write conflicts because each task only accepts a report() call from the original thread after it has finished computing/writing its partial sum.

All that is left are a few minor modifications to our procedure:

   begin
      Check_Command_Line;
      Read_File( Ada.Command_Line.Argument(1), Values );
      Start_Time := Clock;
      Total := Sum_In_Parallel(Values,
                                Integer'Value( Ada.Command_Line.Argument(2) ) );
      Stop_Time := Clock;
      Time_Required := Ada.Real_Time.To_Duration(Stop_Time - Start_Time);
      Display(Total, Time_Required);
   end tasked_array_sum;

Note that we must convert the command-line argument (a string) to an integer before we can pass it to our function's integer parameter. We use Ada's Integer'Value() function to perform this conversion.

Save your changes and use gnatmake to build tasked_array_sum. (Or if you prefer to use make, edit the Makefile and "uncomment" tasked_array_sum on the first line; then you can use make to build tasked_array_sum.) You may have to relocate the definition of Sum_Slice() prior to its earliest call, if that call is outside of its scope. Continue when tasked_array_sum builds correctly.

Testing. Run tasked_array_sum on the input files from the course directory. Begin with the largest input file and 4 threads:

   tasked_array_sum /home/cs/214/labs/12/numbers/1000000numbers.txt 4

Do you get the correct result? If not debug your program and continue when you do.

Performance Analysis. To see what kind of speedup we are getting, add 5 more columns to your spreadsheet, and label them 1 2 3 4 5. For each of the input files, run it three times using 1 thread, and record the middle time in the appropriate row of your spreadsheet under column 1.

Repeat this procedure using 2, 3, 4, and 5 threads and record the middle times in the appropriate spreadsheet cells. In your spreadsheet, answer the following questions:

Ada.1. How does the time using tasked_array_sum and 1 thread compare to the time using array_sum?

Ada.2. How do the times using T > 1 threads compare to the times using a single thread? Explain what is happening.

Ada.3. How big does the array need to be before parallelism with 2 threads is beneficial? With 4 threads?

Ada.4. How do your Ada times compare to your Java times? Why?

In your spreadsheet, create a three dimensional bar chart that shows the performance of tasked_array_sum using 1, 2, 3, 4, and 5 threads, for the array sizes 1,000, 10,000, 100,000, and 1,000,000. Be sure to give your chart a descriptive title, and label all axes appropriately.

Turn In: Create a script file in which you use cat to display tasked_array_sum.adb, and use gnatmakemake)> to show that it built correctly. Run tasked_array_sum, using the file 1000000numbers.txt and 1 thread; then run tasked_array_sum, using the same input file and 4 threads.

That concludes the Ada portion of this lab.

Calvin > CS > 214 > Labs > 12 > Ada

This page maintained by Joel Adams.