When playing with SIMD intrinsic, it is a matter of finding the right instructions to do what you want. But sometimes it is tricky because there are various possibilities to do it. And sometimes, I forgot that this or this… Read more[SSE][AVX][SIMD] Horizontal Sum (sum simd vector – intrinsic)
Post is out dated: Please use OMP_PLACEs=cores from OpenMP 4 Usually, hyperthreading is turned off on computational nodes. However, sometime, it is not… And if your application uses OpenMP you might need to use OMP_PROC_BIND=TRUE carefully.
Just for the fun, I put here the code of an application that moves to clipboard what it receives from input.
You might arrive on this page because you are facing segfault or any strange problem “it works white 1 thread but not with 2” in a mix of OpenMP/Cuda application. It was the case for me, let have a look… Read more[OpenMP][Cuda] How to manage CUDA GPU from OpenMP threads
Another example of SpMV but with cuSparse this time. For the same reason, I was not able to find a basic example on the internet, so I suppose this one can be useful to others.
Here is a code sample of using the MKL to perform SpMV (gemv), I put it in different functions but the code is not clean (mix of C and C++). However it is easy to understand, there are the conversions… Read more[C/C++] Sparse matrix MKL examples (C00, CSR, DIA, BCSR) gemv and conversions
Because the standard says : All critical constructs without a name are considered to have the same unspecified name.
I did a presentation about C++11 to give an overview (targeting non C++ users) here are the slides and the code example.
I was developing an OpenMP code which is using nested parallelism. And I realized that I have some problems with threads affinity (even if my number of threads was lower or equal my number of cores) so I looked to… Read more[C++][OpenMP] Thread affinity manual (set CPU affinity and bind thread by hand)
I was working on a project where MPI and OpenMP were used and where everything about compilation was done. And I had to include some Cuda code to this.
It is true that I am the kind of guy that sometime like to create what already exist. But this time it was because I was not completely satisfy by what exists since there is no standard double linked list… Read more[C] Double Linked list in C with iterator (OpenSource LGPL)
Gcc provides the usual operator (+,-,/,x) for the SSE types. But intel was (I just wrote was because it seems that now it dos). So we implemented quickly these operators to be able do “c=a+b”.
Using MPI_Type_create_struct and MPI_Type_commit, here is a small example to create a type based on a struct. It is clear that it is more safe to do this instead of using the size of the struct and cast to unsigned… Read more[C++][MPI] Create custom data type in mpi
A quick sample of code to replace several lines of text in lots a files.
Maybe you’re trying to put some sse code into a (host) function in a .cu file, well you will not be able to compile.
In this post I present some function taken from different books and rewritten by myself (the first objective was to refresh my memory with some BLAS stuff a long time ago). It composed of 3 modules: Utils, Matrix/vector operations, Linear… Read more[C++/SIMD] Basic Linear Algebra Functions (some with SSE acceleration)
A simple legendre polynomial computation in C/C++.
If you want to know more about flops (on CPU or on GPU) a good first (but good step) is to use this link: https://folding.stanford.edu/home/faq/faq-flops/ They give lots of details and are very clear, in bref, a good reference.
CMake provides a find CUDA (http://www.cmake.org/cmake/help/v3.0/module/FindCUDA.html) and I just show here a small example.
Quick resume about CUDA __constant__ type
You may want to know what version of openmp you are using at compile time in order to activate or not some functionalities. This is possible using the _OPENMP Macro/directive.
Openmp give a barrier for all threads. Here is a class to perform a barrier with only a group of threads.
This quicksort class is a copy of the one from ScalFMM.
In this post I put the code of a small program I developed a week ago about an OpenMP server for linux socket. So this server is using a thread pool and tasks. Also I wrote a minimalist client that… Read more[C++] A tcp/ip server using OpenMP (with Linux socket)
I read (and wrote) 1 month ago some algorithms about pattern matching in text. You can find plenty of this on the web anyway here is my code.
In this post I present an example to profile openmp (or pthread) and mpi application.
Of course there is a difference between static and dynamic scheduling (everyone knows that) but if you want to see how it can make a difference look at the example above.
Here are 3 lines to use openmp in your cmake code
I few weeks ago I wanted to use unit test. But when I was searching for a framework easy to use, fast, that do not need to be installed 10 libs to make it working etc… Well I did not… Read moreC++ – Unit test – easy, one file, basic, simple
A popular question is to ask why one is better or to understand their main differences.
Here is an example of a reduction on a variable to sum the result from each thread.
How to create an application that allows only one instance at a time. Here is my solution inspired from : http://www.qtcentre.org/wiki/index.php?title=SingleApplication
You want to start learning OpenMp and you already know how to program in C/C++? Here is a good extract of code to understand how it works.