B424 Program Decomposition

Dana Vrajitoru
B424 Parallel and Distributed Programming

Program Decomposition

General Ideas

Identify the portions of code that can be done in parallel.
Mapping the code onto multiple processes.
Distributing the input, output, and intermediate data
Managing the access to shared resources.
Synchronizing the processes at various stages of the program.

Code Decomposition

Decomposition: the operation of dividing the computation into smaller parts, some of which may be executed in parallel.
Task: programmer-defined units of code resulting from decomposition.
Granularity: the number / size of the tasks.
Fine-grained decomposition: a large number of tasks
Coarse-grained decomposition: small number of tasks.
Degree of concurrency: the maximum number of tasks that can be executed in the same time.

Decomposition Techniques

Recursive decomposition: used for traditional divide-and-conquer algorithms that are not easy to solve iteratively.
Data decomposition: the data is partitioned and this induces a partitioning of the code in tasks.
Functional decomposition: the the functions to be performed on data are split into multiple tasks.
Exploratory decomposition: decompose problems equivalent to a search of a space for solutions.
Speculative decomposition: when a program may take one of many possible branches depending on results from computations preceding the choice.

Characteristics of Tasks

Task generation:

static - the tasks are known in advance (data decomposition)
dynamic - decided at runtime (recursive decomposition)

Task size:

uniform (they require approximately the same amount of time) or
non-uniform
known/not known.

Task Interaction

Static: it happens at predetermined times and the set of tasks to interact with is known in advance.
Dynamic: the timing of the interaction or the set of tasks to interact with are unpredictable. Harder to implement.
Regular/irregular: it is regular if the interaction follows a pattern that can be exploited for efficiency.

Recursive Decomposition

Examples: QuickSort or MergeSort.

In both cases the operation of sorting an array is divided into two sub problems that can be solved recursively.
Both problems are hard to implement iteratively.
For the Quicksort the task generation is dynamic and the task size is non-uniform.
For the MergeSort the task generation is static and the task size is uniform.

The MergeSort

Divides the array in 2, sorts the 2 parts recursively, then merges the arrays.
The computations are organized in a binary tree.
Each process receives an array to sort from the parent (except for the master).
The process divides the array in 2 and sends the halves to the children.
After the children are done computing, they send the sorted arrays back to the parent.
The parent performed the merge and sends the array back up in the tree.

Sequential MergeSort

void merge_sort(int a[], int first, int last, int aux[])
{
   if (last <= first)
      return;
   int mid = (first+last)/2;
   merge_sort(a, first, mid, aux);
   merge_sort(a, mid+1, last, aux);
   merge_arrays(a, first, mid, a, mid+1, last, aux, first, last);
   for (int i=first; i<=last; i++)
      a[i] = aux[i];
}

void merge_arrays(int a[], int afirst, int alast, int b[], int
bfirst, int blast, int c[], int cfirst, int clast)
{ // skip verification of size of c.
   int i=afirst, j=bfirst, k=cfirst;
   while (i<=alast && j<=blast) {
      if (a[i] < b[j])
         c[k++] = a[i++];
      else
         c[k++] = b[j++];
   }
   while (i<=alast)
      c[k++] = a[i++];
   while (j<=blast) 
      c[k++] = b[j++];
}

Parallel MergeSort

void parallel_merge_sort() 
{
   if (proc_id > 0) {
      Recv(size, parent);
      Recv(a, size, parent);
   } 
   mid = size/2;
   if (both children) {
      Send(mid, child1);
      Send(size-mid, child2);
      Send(a, mid, child1);
      Send(a+mid, size-mid, child2);
      Recv(a, mid, child1);
      Recv(a+mid, size-mid, child2);
      merge_arrays(a, 0, mid, a, mid+1, size, aux, 0, size);
      // declare aux local
      for (int i=first; i<=last; i++)
         a[i] = aux[i];
   }
   else
      merge_sort(a, 0, size);
   if (proc_id > 0)
      Send(a, size, parent); 
}

Data Decomposition

Input, output, or intermediate data decomposition.
Input: if each output is described as a function of the input directly. Some combination of the individual results may be necessary.
Output data decomposition: if it applies, it can result in less communication.
Intermediate data decomposition more rare.
Owner computes rules: the process that owns a part of the data performs all the computations related to it.