Tru64 UNIX Guide to DECthreads


Previous Contents Index

2.5.2 Process-Shared Mutexes

You can create a mutex that protects data that is shared among threads running in different processes. This is called a process-shared mutex.

Create a process-shared mutex by using the pthread_mutexattr_setpshared() routine to set the process-shared attribute in an initialized mutex attributes object and then use that attributes object in a call to pthread_mutex_init().

2.5.3 Process-Shared Condition Variables

You can create a condition variable used to communicate changes to data that is shared among threads running in different processes. This is called a process-shared condition variable.

Create a process-shared condition variable by using the
pthread_condattr_setpshared() routine to set the process-shared attribute in an initialized condition variable attributes object and then use that attributes object in a call to pthread_cond_init().

2.5.4 Process-Shared Read-Write Locks

You can create a read-write lock that protects data that is shared among threads running in different processes. This is called a process-shared read-write lock.

Create a process-shared read-write lock by using the
pthread_rwlockattr_setpshared() routine to set the process-shared attribute in an initialized read-write lock attributes object and then use that attributes object in a call to pthread_rwlock_init().

2.6 Thread-Specific Data

Each thread can use an area of memory private to DECthreads where DECthreads stores thread-specific data. Use this memory to associate arbitrary data with a thread's context. Consider this as the ability to add user-specified fields to the current thread's context or as global variables that have private values in each thread.

A thread-specific data key is shared by all threads within the process---each thread has its own unique value for that shared key.

Use the following routines to create and access thread-specific data:


Chapter 3
Programming with Threads

This chapter discusses programming disciplines that you should follow as you use DECthreads routines in your programs. Pertinent examples include programming for asynchronous execution, choosing a synchronization mechanism, avoiding priority scheduling problems, making code thread-safe, and working with code that is not thread-safe.

3.1 Designing Code for Asynchronous Execution

When programming with threads, always keep in mind that the execution of a thread is inherently asynchronous with respect to other threads running in the system (or in the process).

In short, there is no guarantee of when a thread will start. It can start immediately or not for a significant period of time, depending on the priority of the thread in relation to other threads that are currently running. When a thread will start can also depend on the behavior of other processes, as well as on other threaded subsystems within the current process.

You cannot depend upon any synchronization between two threads unless you explicitly code that synchronization into your program using one of the following:

On a uniprocessor, DECthreads, in most cases, will context-switch threads in user mode, within a single operating system process. (This is true except for system contention scope threads on Tru64 UNIX.) Context switches between such threads occur only at relatively determinate times, such as when you make a blocking call to the threads library or when a timeslice interrupt occurs. This behavior might be termed "slightly asynchronous," because such a library tolerates many classes of errors in your application.

On a multiprocessor system, DECthreads may run more than one application thread simultaneously. Many incautious programming techniques that will not usually cause trouble on a uniprocessor will cause trouble--often in ways that are difficult to isolate and fix--on a multiprocessor.

The following subsections present examples of programming errors.

3.1.1 Avoid Passing Stack Local Data

Avoid creating a thread with an argument that points to stack local data, or to global or static data that is serially reused for a sequence of threads.

Specifically, the thread started with a pointer to stack local data may not start until the creating thread's routine has returned, and the storage may have been changed by other calls. The thread started with a pointer to global or static data may not start until the storage has been reused to create another thread.

3.1.2 Initialize DECthreads Objects Before Thread Creation

Initialize DECthreads objects (such as mutexes) or global data that a thread uses before creating that thread.

On "slightly asynchronous" uniprocessor systems this may seem safe, because the thread will probably not run until the creator blocks. Thus, the error can go undetected initially. On a multiprocessor, or even on a new release of DECthreads with different timeslicing behavior, the created thread may run immediately, before the data has been initialized. This can lead to failures that are difficult to detect. Note that a thread may run to completion, before the call that created it returns to the creator. The system load may affect the timing as well.

Before your program creates a thread, it should set up all requirements that the new thread needs in order to execute. For example, if your program must set the new thread's scheduling parameters, do so with attributes objects when you create it, rather than trying to use pthread_setschedparam() or other routines afterwards. To set global data for the new thread or to create synchronization objects, do so before you create the thread, else set them in a pthread_once() initialization routine that is called from each thread.

3.1.3 Do Not Use Scheduling As Synchronization

Avoid using the scheduling policy and scheduling priority attributes of threads as a synchronization mechanism.

In a uniprocessor system, only one thread can run at a time, and since a higher-priority thread cannot be preempted by a lower-priority running thread, a thread running at higher priority might erroneously be presumed not to need a mutex to access shared data.

On a multiprocessor system, higher- and lower-priority threads are likely to run at the same time. Situations can even arise where higher-priority threads are waiting to run while the threads that are running have a lower priority.

Regardless of whether your code will run only on a uniprocessor implementation, never try to use scheduling as a synchronization mechanism. Even on a uniprocessor system, your SCHED_FIFO thread can become blocked on a mutex (perhaps in a called library routine), on an I/O operation, or even a page fault. Any of these might allow a lower priority thread to run.

3.2 Memory Synchronization Between Threads

Your multithreaded program must ensure that access to data shared between threads is synchronized with the system's memory subsystem. While any written data will, eventually, be seen by other threads, it is essential for communication that some writes appear in a particular sequence. For example, you want a thread that follows a queue link to see the data written to the enxt queue entry. This requires explicit memory synchronization.

The POSIX standard requires that, when calling the following routines, a thread synchronizes its memory access with respect to other threads:
fork() pthread_cond_signal()
pthread_create() pthread_cond_broadcast()
pthread_join() sem_post()
pthread_mutex_lock() sem_trywait()
pthread_mutex_trylock() sem_wait()
pthread_mutex_unlock() semop()
pthread_cond_wait() wait()
pthread_cond_timedwait() waitpid()
pthread_rwlock_*()  
$HIBER  
$WAKE  
$WAIT*  

If a call to one of these routines returns an error, synchronization is not guaranteed. For example, an unsuccessful call to pthread_mutex_trylock() does not necessarily provide actual synchronization.

Synchronization is a "protocol" among cooperating threads, not a single operation. That is, unlocking a mutex does not guarantee memory synchronization with all other threads---only with threads that later perform some synchronization operation themselves, such as locking a mutex.

3.3 Sharing Memory Between Threads

Most threads do not operate independently. They cooperate to accomplish a task, and cooperation requires communication. There are many ways that threads can communicate, and which method is most appropriate depends on the task.

Threads that cooperate only rarely (for example, a boss thread that only sends off a request for workers to do long tasks) may be satisfied with a relatively slow form of communication. Threads that must cooperate more closely (for example, a set of threads performing a parallelized matrix operation) need fast communication---maybe even to the extent of using machine-specific hardware operations.

Most mechanisms for thread communication involve the use of memory, exploiting the fact that all threads within a process share their full address space. Although all addresses are shared, there are three kinds of memory that are characteristically used for communication. The following sections describe the scope (or, the range of locations in the program where code can access the memory) and lifetime (or, the length of time use of the memory is invalid) of each of the three types of memory.

3.3.1 Using Static Memory

Static memory is allocated by the language compiler when it translates source code, so the scope is controlled by the rules of the compiler. For example, in the C language, a variable declared as extern is shared by all scopes where the name is defined anywhere, and a static variable is private to the source file or routine, depending on where it is declared.

In this discussion, static memory is not the same as the C language static storage class. Rather, static memory refers to any variable that is permanently allocated at a particular address for the life of the program.

It is appropriate to use static memory in your multithreaded program when you know that only one instance of an object exists throughout the application. For example, if you want to keep a list of active contexts or a mutex to control some shared resource, you would not want individual threads to have their own copies of that data.

The scope of static memory depends on your programming language's scoping rules. The lifetime of static memory is the life of the program.

3.3.2 Using Stack Memory

Stack memory is allocated by code generated by the language compiler at run time, generally when a routine is initially called. When the program returns from the routine, the storage ceases to be valid (although the addresses still exist and might be accessible).

Generally, the storage is valid while the routine runs, and the actual address can be calculated and passed to other threads; however, this depends on programming language rules. If you pass the address of stack memory to another thread, you must ensure that all other threads are finished processing that data before the routine returns; otherwise the stack will be cleared, and values might be altered by subsequent calls, page fault handling, or other interrupts. The other threads will not be able to determine that this has happened, and erroneous behavior will result.

The scope of stack memory is the routine or a block within the routine. The lifetime is no longer than the time during which the routine or block executes.

3.3.3 Using Dynamic Memory

Dynamic memory is allocated by the program as a result of a call to some memory management routine (for example, the C language run-time routine malloc() or the OpenVMS common run-time routine LIB$GET_VM).

Dynamic memory is referenced through pointer variables. Although the pointer variables are scoped depending on their declaration, the dynamic memory itself has no intrinsic scope or lifetime. It can be accessed from any routine or thread that is given its address and will exist until explicitly made free. In a language supporting automatic garbage collection, it will exist until the run-time system detects that there are no references to it. (If your language supports garbage collection, be sure the garbage collector is thread-safe.)

The scope of dynamic memory is anywhere a pointer containing the address can be referenced. The lifetime is from allocation to deallocation.

Typically dynamic memory is appropriate to manage persistent context. For example, in a reentrant routine that is called multiple times to return a stream of information (such as to list all active connections to a server or to return a list of users), using dynamic memory allows the program to create multiple contexts that are independent of all the program's threads. Thus, multiple threads could share a given context, or a single thread could have more than one context.

3.4 Managing a Thread's Stack

For each thread created by your program, DECthreads sets a default stack size that is acceptable to most applications. You can also set the stacksize attribute in a thread attributes object, to specify the stack size needed by the next thread created.

This section discusses the cases in which the stack size is insufficient (resulting in stack overflow) and how to determine the optimal size of the stack.

Most compilers on Compaq VAX based systems do not probe the stack. This makes stack overflow failure modes unpredictable and difficult to analyze. Be especially careful to use as little stack memory as practical.

Most compilers on Compaq Alpha based systems generate code in the procedure prologue that probes the stack, which detects if there is not enough space for the procedure to run.

3.4.1 Sizing the Stack

To determine the required size of a thread's stack, sum up the sizes of the frames, including local variables, for the deepest call tree. Add to that number an extra amount of memory to accommodate interrupts and context switching. Determining this figure is difficult because stack frames vary in size and because it might not be possible to estimate the depth of library routine call frames.

Compaq's Visual Threads includes a number of tools and procedures to measure and monitor stack use. See the Visual Threads product's online help for more information.

You can also run your program using a profiling tool that measures actual stack use. This is commonly done by "poisoning" the stack before it is used by writing a distinctive pattern, and then checking for that pattern after the thread completes. Remember: Use of profiling or monitoring tools typically increases the amount of stack memory that your program uses.

3.4.2 Using Stack Overflow Warning and Stack Guard Areas

By default, at the overflow end of each thread's stack, DECthreads allocates an overflow warning area followed by a guard area. These two areas can help a multithreaded program detect overflow of a thread's stack.

Tru64 UNIX 5.0 and OpenVMS Alpha 7.2 include overflow warning support to allow the reporting of stack overflows while a thread can still be assured of executing code. The warning area is a page (or more) that is initially protected to trap writes, but then becomes writable so that it can be used to allow reporting or recovering from the overflow. (On Tru64 UNIX, the warning area is again protected once an overflow has been handled; on OpenVMS it remains unprotected.)

A guard area is a region of no access memory. When the thread attempts to access a memory location within this region, a memory addressing violation occurs. For a thread that allocates large data structures on the stack, create that thread using a thread attributes object in which a large guardsize attribute value has been set. A large stack guard region can help to prevent one thread from overflowing into another thread's stack region.

The pages of memory that form a stack guard region are also known as guard pages or "red zone"; the overflow warning area is also known as a "yellow zone".

3.4.3 Diagnosing Stack Overflow Errors

A process can produce a memory access violation (or segmentation fault) when it overflows its stack. As a first step in debugging this behavior, it is often necessary to run the program under the control of your system's debugger to determine which thread's stack has overflowed. However, if the debugger shares resources with the target process (as under OpenVMS), perhaps allocating its own data objects on the target process's stack, the debugger might not operate properly when the stack overflows. In this case, you might be required to analyze the target process by means other than the debugger.

If a thread receives a memory access exception during a routine call or when accessing a local variable, increase the size of the thread's stack. However, not all memory access violations indicate a stack overflow.

For programs that you cannot run under a debugger, determining a stack overflow is more difficult. This is especially true if the program continues to run after receiving a memory access exception. For example, if a stack overflow occurs while a mutex is locked, the mutex might not be released as the thread recovers or terminates. When the program attempts to lock that mutex again, it could hang.

To set the stacksize attribute in a thread attributes object, use the pthread_attr_setstacksize() routine. (See Section 2.3.2.4 for more information.)

3.5 Scheduling Issues

There are programming issues that are unique to the scheduling attributes of threads.

3.5.1 Real-Time Scheduling

Use care when writing code that uses real-time scheduling (i.e. FIFO and RR policies) to control the priority of threads:

3.5.2 Priority Inversion

Priority inversion occurs when the interaction among a group of three or more threads causes that group's highest-priority thread to be blocked from executing. For example, a higher-priority thread waits for a resource locked by a low-priority thread, and the low-priority thread waits while a middle-priority thread executes. The higher-priority thread is made to wait while a thread of lower priority (the middle-priority thread) executes.

You can address the phenomenon of priority inversion as follows:


Previous Next Contents Index