When playing with SIMD intrinsic, it is a matter of finding the right instructions to do what you want. But sometimes it is tricky because there are various possibilities to do it. And sometimes, I forgot that this or this… Read more[SSE][AVX][SIMD] Horizontal Sum (sum simd vector – intrinsic)
You might arrive on this page because you are facing segfault or any strange problem “it works white 1 thread but not with 2” in a mix of OpenMP/Cuda application. It was the case for me, let have a look… Read more[OpenMP][Cuda] How to manage CUDA GPU from OpenMP threads
I was working on a project where MPI and OpenMP were used and where everything about compilation was done. And I had to include some Cuda code to this.
Using MPI_Type_create_struct and MPI_Type_commit, here is a small example to create a type based on a struct. It is clear that it is more safe to do this instead of using the size of the struct and cast to unsigned… Read more[C++][MPI] Create custom data type in mpi
Maybe you’re trying to put some sse code into a (host) function in a .cu file, well you will not be able to compile.
A simple legendre polynomial computation in C/C++.
This quicksort class is a copy of the one from ScalFMM.