[OpenMP][Hyperthreading] Managing thread binding with omp + intel hyperthreading

Post is out dated: Please use OMP_PLACEs=cores from OpenMP 4

Usually, hyperthreading is turned off on computational nodes.
However, sometime, it is not…
And if your application uses OpenMP you might need to use OMP_PROC_BIND=TRUE carefully.

Test code

Let have the following code, which print the openmp thread binding:

#include <omp.h>

#define _GNU_SOURCE
#define __USE_GNU
#include <sys/types.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sched.h>
#include <cstdio>
#include <cassert>

inline int GetThreadBinding(){
    // Mask will contain the current affinity
    cpu_set_t mask;
    // We need the thread pid (even if we are in openmp)
    pid_t tid = (pid_t) syscall(SYS_gettid);
    // Get the affinity
    assert(sched_getaffinity(tid, sizeof(mask), (cpu_set_t*)&mask) != -1);

    if(CPU_COUNT(&mask) == 1){
        int proc = 0;
        while(proc != sizeof(cpu_set_t)*8){
            if(CPU_ISSET(proc, &mask)){
                return proc;
            proc += 1;
    return -1;

int main(){
    #pragma omp parallel
        #pragma omp critical(PRINT)
        printf("Thread %d bind to %d\n", omp_get_thread_num(), GetThreadBinding());

    return 0;

It can be compiled with:

g++ -O3 -fopenmp main.cpp -o test.exe


Now let have a look to what happens:

No bind no taskset

$ ./test.exe 
Thread 1 bind to -1
Thread 3 bind to -1
Thread 2 bind to -1
Thread 0 bind to -1

Bind no taskset

$ OMP_PROC_BIND=true ./test.exe 
Thread 1 bind to 1
Thread 0 bind to 0
Thread 2 bind to 2
Thread 3 bind to 3

We see that the threads are bind (and moreover the thread ids match the core positions).

No bind taskset

$ taskset 0x5 ./test.exe 
Thread 0 bind to -1
Thread 1 bind to -1

I allow the usage of core 0 and 2 (0101 binary => 5 hexa).
OpenMP automatically reduces the number of possible threads.

Bind taskset

$ OMP_PROC_BIND=true taskset 0x5 ./test.exe 
Thread 0 bind to 0
Thread 1 bind to 2

This is what we are looking for.

In the real life

On the 96 cores node I use I simply put this (even in front of mpirun):
One hex char = 4 logical cores;
96 real cores = 384 logical cores;
For on thread per core:
1 x 96 = 1111111111

OMP_PROC_BIND=true taskset 0x111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 mpirun -np 1 ./test.exe

For two threads per core:


Of course the same is possible with numactl.