Who I am

I am a permanent researcher in high-performance computing (HPC) in the CAMUS team at Inria Nancy and the ICPS team of the ICube laboratory.
Our team is located near Strasbourg.

Research Interests

My research interests are centered around:

HPC
Optimizations: vectorization, complex algorithms (e.g., sorting, SpMV)
Accelerators/GPUs: particle interaction, OptiX
Software engineering for HPC: inastemp
Scientific computing: FMM, TD-BEM, bio-physics code for protein simulation, REMC, turbulence simulation applications
Runtime systems: StarPU, Specx, speculative execution
Scheduling: Heteroprio, Multiprio
Tools for automatic vectorization/parallelization: compilation

Projects

I participate in various projects (detailed here ), including:

Exa-soft (PEPR): Member, focusing data placement and runtime systems.
Autospec (ANR): Coordinator, focusing on automatic task-based parallelization and speculative execution.
Microcard (EuroHPC): Collaborator, co-supervising work on MLIR for GPU and vectorization.
Textarossa (EuroHPC): Responsible for WP4, working on task-based scheduling and energy efficiency.
TExas (Inria): Coordinator, working on scheduling multiple task-based applications simultaneously.

Feel free to contact me at berenger.bramas@inria.fr and check out my résumé here.

HDR

I get my HDR in 2025, the manuscript is here.

PhD

I earned my PhD from the HiePACS team at Inria Bordeaux in 2016, under the supervision of Olivier Coulaud (Inria) and Guillaume Sylvand (Airbus).

You can find the PDF of my thesis here and the slides here.

After completing my PhD, I did a postdoc at the RZG/MPCDF, a Max Planck Institute near Munich.

MS

I graduated with: - A Master's Degree in Software Engineering from ISIMA - A Master's Degree in Computer Science and Robotics (Research Specialization) from Blaise Pascal University in 2009.

You can read about my early research work in the following reports (in French): - M1 Report: Design of a sound management system to enhance the impact of human-robot interactions. - M2 Report: Supervised learning with sound-image association in developmental robotics.

Publication

2025

Journals

Specx: a C++ task-based runtime system for heterogeneous distributed architectures Paul Cardosi, Bérenger Bramas
PeerJ CS
Link
RSCHED: An effective heterogeneous resources management for simultaneous execution of task-based applications
Etienne Ndamlabin, Bérenger Bramas
International Journal of Advanced Computer Science and Applications (IJACSA), 2025.
Link
Exploiting ray tracing technology through OptiX to compute particle interactions with cutoff in a 3D environment on GPU
David Algis, Bérenger Bramas
International Journal of Advanced Computer Science and Applications (IJACSA), 2025.
Link
InteropUnityCUDA: A Tool for Interoperability between Unity and CUDA
David Algis, Bérenger Bramas, Emmanuelle Darles, Lilian Aveneau
Software: Practice and Experience, 2025.
Link
Arc Blanc: a real-time ocean simulation framework
David Algis, Bérenger Bramas, Emmanuelle Darles, Lilian Aveneau
Journal of Computer Graphics Techniques, 2025.
Link

Conferences

Improving energy efficiency of HPC applications using unbalanced GPU power capping
Albert d'Aviau de Piolant, Hayfa Tayeb, Bérenger Bramas, Mathieu Faverge, Abdou Guermouche, Amina Guermouche
2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Preprint

Conferences & Workshops

A fast implementation of 3D wavelet compression on GPU Atoli Huppé, Bérenger Bramas, Clément Flint, Stéphane Genaud. Conférence francophone d’informatique en Parallélisme, Architecture et Système (COMPAS 2025), Oct 2025, Bordeaux, France. Link
Arc Blanc: A Real-Time Ocean Simulator
David Algis, Bérenger Bramas, Emmanuelle Darles, Lilian Aveneau
I3D 2025 - ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, 2025
Link
Contraintes d'OpenMP pour la parallélisation automatique à base de tâches
Julien Gaupp, Bérenger Bramas.
Conférence francophone d'informatique en Parallélisme, Architecture et Système (COMPAS 2025) Link
Parallélisation automatique à base de tâches avec modélisation de performances
Bérenger Bramas, Marek Felšöci, Stéphane Genaud.
Conférence francophone d'informatique en Parallélisme, Architecture et Système (COMPAS 2025) Link

Poster

Vers une abstraction composable des méthodes hiérarchiques pour l’accélération du produit matrice-vecteur Antoine Gicquel, Olivier Coulaud, Bérenger Bramas
COMPAS 2025 - Conférence francophone d’informatique en Parallélisme, Architecture et Système, 2025
Link

2024

Journals

Using the Discrete Wavelet Transform for Lossy On-the-Fly Compression of GPU Fluid Simulations
Clément Flint, Atoli Huppé, Philippe Helluy, Bérenger Bramas, Stéphane Genaud
International Journal for Numerical Methods in Fluids, 2024.
Link
SPC5: an efficient SpMV framework vectorized using ARM SVE and x86 AVX-512
Evann Regnault, Bérenger Bramas, Computer Science and Information Systems (ComSIS), 2024.
Link

Conferences with Proceedings

Efficient GPU Implementation of Particle Interactions with Cutoff Radius and Few Particles per Cell
David Algis, Bérenger Bramas, Emmanuelle Darles, Lilian Aveneau
International Symposium on Parallel Computing and Distributed Systems (PCDS2024), IEEE, Sep 2024, Singapore, Singapore.
Link
Dynamic Tasks Scheduler with Multiple Priority-based Trees on Heterogeneous Computing Systems
Hayfa Tayeb, Bérenger Bramas, Abdou Guermouche, Mathieu Faverge
IEEE Heterogeneity in Computing Workshop (HCW'24), IPDPS 2024
Link

2023

Journals

Autovesk: Automatic vectorization of unstructured static kernels by graph transformations
Hayfa Tayeb, Bérenger Bramas, Ludovic Paillat
ACM TACO, 2023.
Link

Conferences with Proceedings

GPU Code Generation of Cardiac Electrophysiology Simulation with MLIR
Tiago Trevisan Jost, Arun Thangamani, Raphaël Colin, Vincent Loechner, Stéphane Genaud, and Bérenger Bramas
EuroPar, 2023.
Link
Lifting Code Generation of Cardiac Physiology Simulation to Novel Compiler Technology
Arun Thangamani, Tiago Trevisan Jost, Vincent Loechner, Stéphane Genaud, and Bérenger Bramas
ACM/IEEE International Symposium on Code Generation and Optimization (CGO23).
Link

Conferences & Workshops

Extending the Task Dataflow Model with Speculative DataAccesses
Anastasios Souris, Bérenger Bramas, Philippe Clauss
Compas 2023.
Link

2022

Journals

Parallel lattice-boltzmann transport solver in complex geometry
Romane Hélie, Matthieu Boileau, Bérenger Bramas, Emmanuel Franck, Philippe Helluy, Laurent Navoret
SMAI Journal of Computational Mathematics, 2022.
Link
Complexes++: Efficient and versatile coarse-grained simulations of protein complexes and their dense solutions
Max Linke, Patrick Quoika, Bérenger Bramas, Juergen Koefinger, Gerhard Hummer
The Journal of Chemical Physics (JCP), 2022.
Link
Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale: the TEXTAROSSA Approach
Microprocessors and Microsystems, 2022.
Link
An Efficient Particle Tracking Algorithm for Large-Scale Parallel Pseudo-Spectral Simulations of Turbulence
Cristian C. Lalescu, Bérenger Bramas, Markus Rampp, Michael Wilczek
Computer Physics Communications, 2022.
Link

Conferences & Workshops

MulTreePrio: Scheduling task-based applications for heterogeneous computing systems
Hayfa Tayeb, Bérenger Bramas, Abdou Guermouche, Mathieu Faverge
Compas 2022.
Link
Parallelization of the Lattice-Boltzmann schemes using the task-based method
Clément Flint, Berenger Bramas, Stéphane Genaud, Philippe Helluy
Compas 2022.
Link

2021

Journals

Automated prioritizing heuristics for parallel task graph scheduling in heterogeneous computing
Clément Flint, Ludovic Paillat, Bérenger Bramas
PeerJ CS, 2021.
Link
A fast vectorized sorting implementation based on the ARM scalable vector extension (SVE)
Bérenger Bramas
PeerJ CS, 2021.
Link
Shape- and scale-dependent coupling between spheroids and velocity gradients in turbulence
Nimish Pujara, José-Agustín Arguedas-Leiva, Cristian C. Lalescu, Bérenger Bramas, Michael Wilczek
Journal of Fluid Mechanics, 2021.
Link

Conferences with Proceedings

TEXTAROSSA: Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale
Agosta, G., Cattaneo, D., Fornaciari, W., Galimberti, A., Massari, G., Reghenzani, F., ... & Ammendola, R.
24th Euromicro Conference on Digital System Design (DSD) IEEE, 2021
Link

2020

Journals

TBFMM: A C++ generic and parallel fast multipole method library
Bérenger Bramas
The Journal of Open Source Software, 2020.
Link
An integral equation formulation of the n-body dielectric spheres problem. Part II: Complexity analysis
Bérenger Bramas, Muhammad Hassan, Benjamin Stamm
ESAIM: Mathematical Modelling and Numerical Analysis, 2020.
Link
Optimization of a discontinuous finite element solver with OpenCL and StarPU
Bérenger Bramas, Philippe Helluy, Laura Mendoza, Bruno Weber
IJFV International Journal On Finite Volumes, 2020.
Link

Conferences & Workshops

Automatic task-based parallelization of C++ applications by source-to-source transformations
Garip Kusoglu, Bérenger Bramas, Stéphane Genaud
Compas 2020.
Link

Research Reports

On the improvement of the in-place merge algorithm parallelization
Bérenger Bramas, Quentin Bramas, 2020.
Link

2019

Journals

Improving parallel executions by increasing task granularity in task-based runtime systems using acyclic DAG clustering
Bérenger Bramas, Alain Ketterlin
PeerJ Computer Science, 2019.
Link
Impact study of data locality on task-based applications through the Heteroprio scheduler
Bérenger Bramas
PeerJ Computer Science, 2019.
Link
Increasing the Degree of Parallelism Using Speculative Execution in Task-based Runtime Systems
Bérenger Bramas
PeerJ Computer Science, 2019.
Link

Conferences & Workshops

SPETABARU: A Task-based Runtime System with Speculative Execution Capability
Berenger Bramas
SIAM CSE 2019.
Link

2018

Journals

Computing the Sparse Matrix Vector Product using Block-Based Kernels Without Zero Padding on Processors with AVX-512 Instructions
Bérenger Bramas, Pavel Kus
PeerJ Computer Science, 2018.
Link

Conferences & Workshops

Limitations of OpenMP task-based parallelization to achieve high performance and create a robust software design
Berenger Bramas
PMAA18, 10th International Workshop on Parallel Matrix Algorithms and Applications, 2018.

2017

Journals

A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake
Bérenger Bramas
International Journal of Advanced Computer Science and Applications (IJACSA), Volume 8, Issue 10, 2017.
Link
Inastemp: A Novel Intrinsics-as-Template Library for Portable SIMD-Vectorization
Bérenger Bramas
Scientific Programming Journal, 2017.
Link
Bridging the gap between OpenMP and task-based runtime systems for the fast multipole method
Emmanuel Agullo, Olivier Aumage, Bérenger Bramas, Olivier Coulaud, Samuel Pitoiset
IEEE Transactions on Parallel and Distributed Systems, 2017.
Link

Research Reports

Modeling Irregular Kernels of Task-based codes: Illustration with the Fast Multipole Method
Emmanuel Agullo, Berenger Bramas, Olivier Coulaud, Samuel Thibault, Luka Stanisic, 2017.
Link

2016

Journals

Task-based FMM for heterogeneous architectures
Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Eric Darve, Matthias Messner, Toru Takahashi
Concurrency and Computation: Practice and Experience, 2016.
Link

Research Reports

Task-based fast multipole method for clusters of multicore processors
Emmanuel Agullo, Olivier Coulaud, Martin Khannouz, Luka Stanisic, Berenger Bramas, 2016.
Link

2015

Journals

Time-Domain BEM for the Wave Equation on Distributed-Heterogeneous Architectures: A Blocking Approach
Bérenger Bramas, Olivier Coulaud, Guillaume Sylvand
Parallel Computing, 2015.
Link

Conferences & Workshops

ScalFMM: a Generic Parallel Fast Multipole Library Pierre Blanchard, Berenger Bramas, Olivier Coulaud, Eric F. Darve, Laurent Dupuy, Arnaud Etcheverry, Guillaume Sylvand
SIAM CSE 2015.
Link
Hierarchical Randomized Low-Rank Approximations: Applications to covariance kernel matrices and generation of Gaussian Random Fields
Pierre Blanchard, Olivier Coulaud, Eric Darve, Berenger Bramas
SIAM Conference on Applied Linear Algebra (SIAM LA) 2015.
Task-Based Parallelization of the Fast Multipole Method on Nvidia GPUs and Multicore Processors
Eric F. Darve, Emmanuel Agullo, Berenger Bramas, Olivier Coulaud, Matthias Messner, Toru Takahashi
SIAM CSE 2015
Link

2014

Conferences with Proceedings

Time-domain BEM for the Wave Equation: Optimization and Hybrid Parallelization
Berenger Bramas, Olivier Coulaud, Guillaume Sylvand
Euro-Par 2014 Parallel Processing, Springer International Publishing, 2014, pp. 511-523
Link

Conferences & Workshops

New Computational Ordering to Reach High Performance in the Time-domain BEM for the Wave Equation
Berenger Bramas, Olivier Coulaud, Guillaume Sylvand
Sparse Days 2014
Link

2013

Journals

Task-based FMM for Multicore Architectures
Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Eric Darve, Matthias Messner, Toru Takahashi
SIAM Journal on Scientific Computing (SISC), 2013.
Link

Conferences & Workshops

Task-based Parallelization of the Fast Multipole Method on NVIDIA GPUs and Multicore Processors Emmanuel Agullo, Berenger Bramas, Olivier Coulaud, Matthias Messner, Eric Darve, Toru Takahashi
GTC 2013.
Link
Pipelining the Fast Multipole Method over a Runtime System
Emmanuel Agullo, Berenger Bramas, Olivier Coulaud, Eric F. Darve, Matthias Messner, Toru Takahashi
SIAM CSE 2013.
Link

2012

Research Reports

Optimized M2L Kernels for the Chebyshev Interpolation based Fast Multipole Method
Matthias Messner, Berenger Bramas, Olivier Coulaud, Eric Darve, 2012.
Link

Posters

Poster: Matrices over Runtime Systems at Exascale
Agullo, Emmanuel, et al. High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, IEEE, 2012. Link

2008

Conferences with Proceedings

Design of a Sound System to Increase Emotional Expression Impact in Human-Robot Interaction
Berenger Bramas, Young-Min Kim, and Dong-Soo Kwon
International Conference on Control, Automation and Systems 2008, Oct. 14-17, 2008 in COEX, Seoul, Korea, pp. 2732-2737.
Link

External Lists

You can find more about some of my past and current research at:
- HAL Publications
- Google Scholar Profile
- ORCID Profile

Projects

Ongoing

Exasoft (Numpex) - ANR JCJC - 2024/2025 - Member
The Exa-SofT project focuses on advancing the HPC software and tools ecosystem to address the unique challenges posed by exascale supercomputers, characterized by their heterogeneous and scalable compute-node hardware, including CPUs and diverse GPU architectures. By fostering multidisciplinary collaboration, the project aims to develop a coherent, tightly integrated, and exascale-ready software stack. Key challenges tackled include enhancing productivity, ensuring performance portability, managing heterogeneity and scalability, achieving resilience, and optimizing performance and energy efficiency, thereby enabling complex parallel applications to fully leverage the capabilities of exascale architectures.
AUTOSPEC - ANR JCJC - 2021/2025 - Coordinator
The AUTOSPEC project aims to create methods for automatic task-based parallelization and to improve this paradigm by increasing the degree of parallelism using speculative execution. The project will focus on source-to-source transformations for automatic parallelization, speculative execution models, DAG scheduling, and the activation mechanisms for speculative execution. With this aim, the project will rely on a source-to-source compiler that targets the C++ language, a runtime system with speculative execution capabilities, and an editor (IDE) to enable compiler-guided development. The outcomes from the project will be open-source with the objective of developing a user community. The benefits will be of great interest both for developers who want to use an automatic parallelization method, but also for high-performance programming experts who will benefit from improvements of the task-based programming. The results of this project will be validated in various applications such as a protein complexes simulation software, and widely used open-source software. The aim will be to cover a wide range of applications to demonstrate the potential of the methods derived from this project while trying to establish their limitations to open up new research perspectives.

Past

TEXTAROSSA - EuroHPC - 2021/2024 - Head of WP4
This European project aims at achieving a broad impact on the High Performance Computing (HPC) field both in pre-exascale and exascale scenarios. The TEXTAROSSA consortium will develop new hardware accelerators, innovative two-phase cooling equipment, advanced algorithms, methods and software products for traditional HPC domains as well as for emerging domains in High Performance Artificial Intelligence (HPC-AI) and High Performance Data Analytics (HPDA). We will focus on the scheduling of task-graphs under energy constraints and on porting scientific codes on heterogeneous computing nodes with FPGAs.
MICROCARD - EuroHPC - 2021/2024 - Member
Numerical models of cardiac electrophysiology need to move from a continuum approach to a cell-by-cell approach to match observations in aging and diseased hearts. Exascale computers will be needed to run such models. The application is co-designed by HPC experts, numerical scientists, biomedical engineers, and biomedical scientists, from academia and industry. We developed, in concert, numerical schemes suitable for exascale parallelism, problem-tailored linear-system solvers and preconditioners, and a compiler to translate high-level model descriptions into optimized, energy-efficient system code for heterogeneous computing systems.
TExas - Inria exploratory project - 2021/2023 - Coordinator
The TEXAS project aims to optimize the performance of supercomputers, particularly Exascale machines, which are capable of executing a billion billion operations per second. The objectives of the Texas project include: (1) Developing new programming models for parallel computing to enhance the efficiency of supercomputers. (2) Creating dynamic algorithms for task management, allowing supercomputers to adjust their computing power during the execution of high-performance applications (HPC), like numerical simulations. (3) Reducing the high energy consumption of supercomputers, which is a significant scientific and ecological issue.

Software

Main/only contributor

Main software
- TBFMM: Lightweight task-based FMM using the block tree and OpenMP. https://gitlab.inria.fr/bramas/tbfmm
- SPECX: A task-based runtime system with speculative execution capability. Now supports GPUs and MPI. https://gitlab.inria.fr/bramas/specx
  (It was previously known as SPETABARU https://gitlab.inria.fr/bramas/spetabaru)
- APAC (not yet public): Source-to-source compiler for automatic parallelization using the task-based method.
- INASTEMP: A vectorization library that manages conditional statements (supports X86 extensions and ARM SVE). https://gitlab.inria.fr/bramas/inastemp
- SCALFMM: A generic fast multipole library. https://gitlab.inria.fr/solverstack/ScalFMM
- SPC5: A sparse matrix vector product library for AVX 512 and ARM SVE. https://gitlab.inria.fr/bramas/spc5
- AVX512 SORT: A fast sorting algorithm for AVX 512. https://gitlab.inria.fr/bramas/avx-512-sort
- AUTOVESK: Automatic vectorization of static kernels with a greedy algorithm. https://gitlab.inria.fr/bramas/autovesk
- Scanet (not public): A draft of a DNN lib (to be used in future projects).
PoC
- Particle interaction on GPU. https://gitlab.inria.fr/bramas/gpu-particle-interaction
- clsimple: A simple command line arg manager. https://gitlab.inria.fr/bramas/clsimple
- Farm-SVE: Naive/scalar implementation of the ARM C language extensions (ACLE) for the ARM Scalable Vector Extension (SVE) in standard C++. https://gitlab.inria.fr/bramas/farm-sve
- DAGPAR: Direct acyclic graph clustering library. https://gitlab.inria.fr/bramas/dagpar
- INPLACE MERGE: Some improvements to the inplace merge algorithm. https://gitlab.inria.fr/bramas/inplace-merge
- CUDA Tests & Bench: Some tests with CUDA to compare algorithms/implementations. https://gitlab.inria.fr/bramas/CUDA-Benchs-AND-Tests
Outdated
- SPARSE-TD (not public): Solver for the time domain boundary element method (BEM) for the wave equation.
- FMM-TD (not public): FMM solver for the time domain boundary element method (BEM) for the wave equation.

Contribute(d) to

COMPLEXES-PP: Coarse-grained simulations of biomolecular complexes. https://github.com/bio-phys/complexespp
STARPU: A task-based runtime system. https://starpu.gitlabpages.inria.fr/
TURTLE (not public yet): Pseudospectral direct numerical simulations (DNS) of the incompressible Navier-Stokes equations. https://gitlab.mpcdf.mpg.de/TurTLE/turtle
CHUKRUT: Conservative Hyperbolic Upwind Kinetic Resolution of Unstructured Tokamaks. https://gitlab.math.unistra.fr/tonus/chukrut/chukrut
SCHNAPS: Discontinuous finite element solver with OpenCL and StarPU. http://schnaps.gforge.inria.fr/
PASTIX: Parallel Sparse direct Solver (my modifications are currently not public.) https://gitlab.inria.fr/solverstack/pastix

Supervision

Most of these persons are also co-advised by one or two other colleagues.

Postdoc

Clément Flint (2025-2025, ANR Autospec)
Marek Felsoci (2023-2024, Autospec project)
Jean Etienne Ndamlabin Mboula (2022-2024, Texas project)

PhDs

Atoli Huppé (2024-2027, Numpex Exasoft)
Antoine Gicquel (2023-2026, ANR)
David Algis (2022-2025, Cifre)
Anastasios Souris (2022-2024, 2 years only, Autospec project, left without notice, contract ended by Inria)
Hayfa Tayeb (2021-2024, Textarossa project)
Garip Kusoglu (2021-2022, 1 year only, Autospec project, moved to the industry)
Clément Flint (2020-2024, Idex IRMIA)

Engineer

Paul Cardosi (2020-2021, ADT SPETABARU)

Interns

Julien Gaupp (2023-2025)
Atoli Huppé (2023)
Mohamed Bouaziz (2023)
Hayfa Tayeb (2021)
Clément Flint (2020)
Garip Kusoglu (2019-2021)
Michel Tching (2022 and 2021)
Ludovic Paillat (2021)
David Nicolazo (2021)

Projects of MS students

2024: Ahmed Zaned, Arhun Saday, Oualid Rhechim, Kalim Moussa, Assalas Lakrouz, Patrick De Montferrier, Mohamad Ali Awada, Mohammed Arezki Atmimou
2023: Matthieu Freitag, Quentin Gerling, Pierre Bertholet, Evann Regnault , Filipe Augusto, Iman Iraj Doost
2022: Tom Hammer, Mathieu Fessler, Lucas Schmidt, Karim Bouali, Laurent Werey, Arnaud Kientzler, Stephen Foerster, Ludovic Paillat
2021: Nicolas Laforet, Maxime Princelle, Marline Vauchair, Thomas Steinmetz, Nadjib Belaribi, Janos Falke
2020: Jimmy Huynh, Garip Kusoglu, Thomas Millot, Camille Millot, Aurélien David
2019: Clément Flint

Teaching

Present

(since 2018) I am involved in the compilation course for the master degree in CS at the university of Strasbourg.
(since 2019) I am in charge of the compilation & performance course for the master degree in scientific computing at the university of Strasbourg.

Past

Teaching at Enseirb/IPB from 2013 to 2015 (2 years x 70 hours)

Algorithms and hierarchical data structures with Denis Lapoire
C programming with Georges Eyrolles
System programming with Brice Goglin