B424 Shared memory

Dana Vrajitoru
B424 Parallel and Distributed Programming

Shared Memory Models

Programs and libraries designed for computers with several CPUs that have access to the same memory - multiprocessor machines.

The RAM memory is common, and each processor has its own cache.
The programs are called multi-threaded (multiple execution threads).

Two types of machines

Uniform memory access (UMA) for a small number of CPUs (2 to about 30), for which the interconnection network is a memory bus or a crossbar switch. The access time to any memory location is about the same for each process.
Non-uniform memory access (NUMA) for a large number of CPUs (tens or hundreds), for which the interconnection network is a tree-structured collection of switches and memories. For each processor, some parts of the memory are closer (faster access) and others are farther (slower access).

Specific problems

Cache consistency problem - when a processor modifies the values of a variable in its own cache, the cache of the other CPUs no longer has the correct value.
Memory consistency problem - when is the primary memory actually updated?
Sequential consistency - memory updates occur in a sequential order and every processor sees it in the same order.
False sharing - suppose that processor 1 uses variable x, processor 2 uses variable y, and the two variables are on the same line in the cache. Whenever one process modifies its variable, the cache of the other processor is also updated.
These problems are often taken care of by the libraries.

Applications for multi-threaded programs

Interfaces where some widgets have to stay active even while the program is performing some computation (media player).
Agent applications - artificial intelligence, games.

Background_play() {
while (data) {
    Play_data(data);
    if (stop_signal) break;
    Get_data(data);
}
}

class Stop_button {
...
onClick: Generate(stop_signal);
...
};

Variables that are shared

Global variables among threads (processes) generated in the same module or declared as extern.
Static variables for functions called by several threads.
Reference parameters.
A pointer containing the same address for several threads / processes refers to the same physical memory.

Variables that are not shared

Local auto variables for any functions.
Value parameters.

Threads Programming

A thread is an independent stream of instructions that can be scheduled to run as such by the operating system.

In the Unix/Linux environments a thread:

Exists within a process and uses the process resources.
Has its own independent flow of control as long as its parent process exists and the OS supports it.
May share the process resources with other threads that act equally independently (and dependently).
Dies if the parent process dies - or something similar.
Creating a new thread within a process requires a lot less time and resources than creating a new process.

Process	Thread
Process ID, process group ID, user ID, and group ID Environment Working directory. Program instructions Registers Stack Heap File descriptors Signal actions Shared libraries Inter-process communication tools (such as message queues, pipes, semaphores, or shared memory).	Stack pointer Registers Scheduling properties (such as policy or priority) Set of pending and blocked signals Thread specific data.

Types of Threads

Kernel-level threads: the kernel is responsible of the scheduling them. They can take advantage of multiple CPUs.
User-level threads: live without any support from the kernel; they maintain all of their state in user space. Since the kernel does not know about them, they cannot be scheduled to run on multiple processors in parallel.

Protection Boundary

A protection boundary protects one software subsystem on a computer from another, in such a way that only data that is explicitly shared across such a boundary is accessible to the entities on both sides.

All code within a protection boundary will have access to all data within that boundary.

Examples:

between processes and the kernel. The kernel is protected from processes; they can only examine or change its internal state in strictly-defined ways.
between individual processes. This prevents one buggy or malicious process from wreaking havoc on others.

Transferring control across protection boundaries is expensive.

Critical Section Problem

Critical section (region): Part of the code and/or memory that must not be accessed in read or write mode by more than one process at a time.

process Critical_section() {
while (true) {
    entry protocol;
    critical section;
    exit protocol;
    noncritical section;
}
}

The entry and exit protocols for the critical region must satisfy the following requirements:

Mutual exclusion - At most one process at a time is executing the CS.
Absence of deadlock - If more than one process is trying to enter the CS, at least one will succeed.
Absence of delay - If only one process is trying to access the CS, it is not prevented from entering it.
Entry guarantee - A process that is attempting to enter a CS will eventually succeed if no process is doing an endless loop in the CS.

Atomic operation: a sequence of one or more instructions that appears to be executed as an indivisible action. Example: a statement that translates into a single machine language instruction.
The initialization, test, increase, and decrease operations must be atomic.

Example of entry and exit protocols. The while loop in the entry protocol is also known as a Busy Waiting procedure.

short control = 0;

Entry () {
while (control < 0);
control--;
}

Exit() {
control++;
}