Guide into OpenMP: Easy multithreading programming for C++

The for construct splits the for-loop so that each thread in the current team handles a different portion of the loop.

#pragma omp for
 for(int n=0; n<10; ++n)
   printf(" %d", n);

This loop will output each number from 0…9 once. However, it may do it in arbitrary order. It may output, for example:

0 5 6 7 1 8 2 3 4 9.

Internally, the above loop becomes into code equivalent to this:

int this_thread = omp_get_thread_num(), num_threads = omp_get_num_threads();
  int my_start = (this_thread  ) * 10 / num_threads;
  int my_end   = (this_thread+1) * 10 / num_threads;
  for(int n=my_start; n<my_end; ++n)
    printf(" %d", n);

So each thread gets a different section of the loop, and they execute their own sections in parallel.

Note: #pragma omp for only delegates portions of the loop for different threads in the current team. A team is the group of threads executing the program. At program start, the team consists only of a single member: the master thread that runs the program.

To create a new team of threads, you need to specify the parallel keyword. It can be specified in the surrounding context:

#pragma omp parallel
  #pragma omp for
  for(int n=0; n<10; ++n) printf(" %d", n);

Equivalent shorthand is to specify it in the pragma itself, as #pragma omp parallel for:

#pragma omp parallel for
 for(int n=0; n<10; ++n) printf(" %d", n);

You can explicitly specify the number of threads to be created in the team, using the num_threads attribute:

#pragma omp parallel num_threads(3)
   // This code will be executed by three threads.

   // Chunks of this loop will be divided amongst
   // the (three) threads of the current team.

   #pragma omp for
   for(int n=0; n<10; ++n) printf(" %d", n);

Note that OpenMP also works for C. However, in C, you need to set explicitly the loop variable as private, because C does not allow declaring it in the loop body:

int n;
 #pragma omp for private(n)
 for(n=0; n<10; ++n) printf(" %d", n);

See the “private and shared clauses” section for details.

In OpenMP 2.5, the iteration variable in for must be a signed integer variable type. In OpenMP 3.0, it may also be an unsigned integer variable type, a pointer type or a constant-time random access iterator type. In the latter case, std::distance() will be used to determine the number of loop iterations.

The scheduling algorithm for the for-loop can explicitly controlled.

#pragma omp for schedule(static)
 for(int n=0; n<10; ++n) printf(" %d", n);

There are five scheduling types: static, dynamic, guided, auto, and (since OpenMP 4.0) runtime. In addition, there are three scheduling modifiers (since OpenMP 4.5): monotonic, nonmonotonic, and simd.

static is the default schedule as shown above. Upon entering the loop, each thread independently decides which chunk of the loop they will process.

There is also the dynamic schedule:

#pragma omp for schedule(dynamic)
 for(int n=0; n<10; ++n) printf(" %d", n);

In the dynamic schedule, there is no predictable order in which the loop items are assigned to different threads. Each thread asks the OpenMP runtime library for an iteration number, then handles it, then asks for next, and so on. This is most useful when used in conjunction with the ordered clause, or when the different iterations in the loop may take different time to execute.

The chunk size can also be specified to lessen the number of calls to the runtime library:

#pragma omp for schedule(dynamic, 3)
 for(int n=0; n<10; ++n) printf(" %d", n);

In this example, each thread asks for an iteration number, executes 3 iterations of the loop, then asks for another, and so on. The last chunk may be smaller than 3, though.

Internally, the loop above becomes into code equivalent to this (illustration only, do not write code like this):

int a,b;
  if(GOMP_loop_dynamic_start(0,10,1, 3, &a,&b))
    do {
      for(int n=a; n<b; ++n) printf(" %d", n);
    } while(GOMP_loop_dynamic_next(&a,&b));

The guided schedule appears to have behavior of static with the shortcomings of static fixed with dynamic-like traits. It is difficult to explain —
this example program

maybe explains it better than words do. (Requires libSDL to compile.)

The “runtime” option means the runtime library chooses one of the scheduling options at runtime at the compiler library’s discretion.

A scheduling modifier can be added to the clause, e.g.: #pragma omp for schedule(nonmonotonic:dynamic
The modifiers are:

  • monotonic: Each thread executes chunks in an increasing iteration order.

  • nonmonotonic: Each thread executes chunks in an unspecified order.

  • simd: If the loop is a simd loop, this controls the chunk size for scheduling in a manner that is optimal for the hardware limitations according to how the compiler decides. This modifier is ignored for non-SIMD loops.

