When playing with SIMD intrinsic, it is a matter of finding the right instructions to do what you want. But sometimes it is tricky because there are various possibilities to do it. And sometimes, I forgot that this or this… Read more[SSE][AVX][SIMD] Horizontal Sum (sum simd vector – intrinsic)
You might arrive on this page because you are facing segfault or any strange problem “it works white 1 thread but not with 2” in a mix of OpenMP/Cuda application. It was the case for me, let have a look… Read more[OpenMP][Cuda] How to manage CUDA GPU from OpenMP threads
Because the standard says : All critical constructs without a name are considered to have the same unspecified name.
I was working on a project where MPI and OpenMP were used and where everything about compilation was done. And I had to include some Cuda code to this.
Using MPI_Type_create_struct and MPI_Type_commit, here is a small example to create a type based on a struct. It is clear that it is more safe to do this instead of using the size of the struct and cast to unsigned… Read more[C++][MPI] Create custom data type in mpi
Maybe you’re trying to put some sse code into a (host) function in a .cu file, well you will not be able to compile.
A simple legendre polynomial computation in C/C++.
CMake provides a find CUDA (http://www.cmake.org/cmake/help/v3.0/module/FindCUDA.html) and I just show here a small example.
You may want to know what version of openmp you are using at compile time in order to activate or not some functionalities. This is possible using the _OPENMP Macro/directive.
Openmp give a barrier for all threads. Here is a class to perform a barrier with only a group of threads.
This quicksort class is a copy of the one from ScalFMM.
In this post I present an example to profile openmp (or pthread) and mpi application.
Of course there is a difference between static and dynamic scheduling (everyone knows that) but if you want to see how it can make a difference look at the example above.