Dana Vrajitoru
B424 Parallel and Distributed Programming
Program Decomposition
General Ideas
- Identify the portions of code that can be done in parallel.
- Mapping the code onto multiple
processes.
- Distributing the input, output, and intermediate data
- Managing the access to shared resources.
- Synchronizing the processes at various stages of the program.
Code Decomposition
- Decomposition: the operation of dividing the computation
into smaller parts, some of which may be executed in parallel.
- Task: programmer-defined units of code resulting from
decomposition.
- Granularity: the number / size of the tasks.
- Fine-grained decomposition: a large number of tasks
- Coarse-grained decomposition: small number of tasks.
- Degree of concurrency: the maximum number of tasks that can
be executed in the same time.
Decomposition Techniques
- Recursive decomposition: used for traditional
divide-and-conquer algorithms that are not easy to solve
iteratively.
- Data decomposition: the data is partitioned and this
induces a partitioning of the code in tasks.
- Functional decomposition: the the functions to be performed
on data are split into multiple tasks.
- Exploratory decomposition: decompose problems equivalent to
a search of a space for solutions.
- Speculative decomposition: when a program may take one of
many possible branches depending on results from computations
preceding the choice.
Characteristics of Tasks
Task generation:
- static - the tasks are known in advance (data decomposition)
- dynamic - decided at runtime (recursive decomposition)
Task size:
- uniform (they require approximately the same amount of time)
or
- non-uniform
- known/not known.
Task Interaction
- Static: it happens at predetermined times and the set of
tasks to interact with is known in advance.
- Dynamic: the timing of the interaction or the set
of tasks to interact with are unpredictable. Harder to implement.
- Regular/irregular: it is regular if the
interaction follows a pattern that can be exploited for efficiency.
Recursive Decomposition
Examples: QuickSort or MergeSort.
- In both cases the operation of sorting an array is divided into
two sub problems that can be solved recursively.
- Both problems are hard to implement iteratively.
- For the Quicksort the task generation is dynamic and the task size
is non-uniform.
- For the MergeSort the task generation is static and the task size
is uniform.
The MergeSort
- Divides the array in 2, sorts the 2 parts recursively, then merges
the arrays.
- The computations are organized in a binary tree.
- Each process receives an array to sort from the parent (except for
the master).
- The process divides the array in 2 and sends the halves to the
children.
- After the children are done computing, they send the sorted arrays
back to the parent.
- The parent performed the merge and sends the array back up in the
tree.
Sequential MergeSort
void merge_sort(int a[], int first, int last, int aux[])
{
if (last <= first)
return;
int mid = (first+last)/2;
merge_sort(a, first, mid, aux);
merge_sort(a, mid+1, last, aux);
merge_arrays(a, first, mid, a, mid+1, last, aux, first, last);
for (int i=first; i<=last; i++)
a[i] = aux[i];
}
void merge_arrays(int a[], int afirst, int alast, int b[], int
bfirst, int blast, int c[], int cfirst, int clast)
{ // skip verification of size of c.
int i=afirst, j=bfirst, k=cfirst;
while (i<=alast && j<=blast) {
if (a[i] < b[j])
c[k++] = a[i++];
else
c[k++] = b[j++];
}
while (i<=alast)
c[k++] = a[i++];
while (j<=blast)
c[k++] = b[j++];
}
Parallel MergeSort
void parallel_merge_sort()
{
if (proc_id > 0) {
Recv(size, parent);
Recv(a, size, parent);
}
mid = size/2;
if (both children) {
Send(mid, child1);
Send(size-mid, child2);
Send(a, mid, child1);
Send(a+mid, size-mid, child2);
Recv(a, mid, child1);
Recv(a+mid, size-mid, child2);
merge_arrays(a, 0, mid, a, mid+1, size, aux, 0, size);
// declare aux local
for (int i=first; i<=last; i++)
a[i] = aux[i];
}
else
merge_sort(a, 0, size);
if (proc_id > 0)
Send(a, size, parent);
}
Data Decomposition
- Input, output, or intermediate data decomposition.
- Input: if each output is described as a function of the input
directly. Some combination of the individual results may be necessary.
- Output data decomposition: if it applies, it can result in less
communication.
- Intermediate data decomposition more rare.
- Owner computes rules: the process that owns a part of the data
performs all the computations related to it.