Who I am
I am a permanent researcher in high-performance computing (HPC) in the CAMUS team at Inria Nancy and the ICPS team of the ICube laboratory.
Our team is located near Strasbourg.
Research Interests
My research interests are centered around:
- HPC
- Optimizations: vectorization, complex algorithms (e.g., sorting, SpMV)
- Accelerators/GPUs: particle interaction, OptiX
- Software engineering for HPC: inastemp
- Scientific computing: FMM, TD-BEM, bio-physics code for protein simulation, REMC, turbulence simulation applications
- Runtime systems: StarPU, Specx, speculative execution
- Scheduling: Heteroprio, Multiprio
- Tools for automatic vectorization/parallelization: compilation
Projects
I participate in various projects, including:
- Autospec (ANR): Coordinator, focusing on automatic task-based parallelization and speculative execution.
- Microcard (EuroHPC): Collaborator, co-supervising work on MLIR for GPU and vectorization.
- Textarossa (EuroHPC): Responsible for WP4, working on task-based scheduling and energy efficiency.
- TExas (Inria): Coordinator, working on scheduling multiple task-based applications simultaneously.
Feel free to contact me at berenger.bramas@inria.fr and check out my résumé here.
PhD
I earned my PhD from the HiePACS team at Inria Bordeaux in 2016, under the supervision of Olivier Coulaud (Inria) and Guillaume Sylvand (Airbus).
You can find the PDF of my thesis here and the slides here.
After completing my PhD, I did a postdoc at the RZG/MPCDF, a Max Planck Institute near Munich.
MS
I graduated with: - A Master's Degree in Software Engineering from ISIMA - A Master's Degree in Computer Science and Robotics (Research Specialization) from Blaise Pascal University in 2009.
You can read about my early research work in the following reports (in French): - M1 Report: Design of a sound management system to enhance the impact of human-robot interactions. - M2 Report: Supervised learning with sound-image association in developmental robotics.
Publication
Submitted
- RSCHED: An effective heterogeneous resources management for simultaneous execution of task-based applications
Etienne Ndamlabin, Bérenger Bramas
Preprint - Exploiting ray tracing technology through OptiX to compute particle interactions with cutoff in a 3D environment on GPU
David Algis, Bérenger Bramas
Preprint - Specx: a C++ task-based runtime system for heterogeneous distributed architectures
Paul Cardosi, Bérenger Bramas
Draft
Journals
-
Arc Blanc: a real-time ocean simulation framework
David Algis, Bérenger Bramas, Emmanuelle Darles, Lilian Aveneau, accepted
Journal of Computer Graphics Techniques, 2025. -
Using the Discrete Wavelet Transform for Lossy On-the-Fly Compression of GPU Fluid Simulations
Clément Flint, Atoli Huppé, Philippe Helluy, Bérenger Bramas, Stéphane Genaud
International Journal for Numerical Methods in Fluids, 2024.
Preprint -
SPC5: an efficient SpMV framework vectorized using ARM SVE and x86 AVX-512
Evann Regnault, Bérenger Bramas, Computer Science and Information Systems (ComSIS), 2024.
PDF -
Autovesk: Automatic vectorization of unstructured static kernels by graph transformations
Hayfa Tayeb, Bérenger Bramas, Ludovic Paillat
ACM TACO, 2023.
Link -
Parallel lattice-boltzmann transport solver in complex geometry
Romane Hélie, Matthieu Boileau, Bérenger Bramas, Emmanuel Franck, Philippe Helluy, Laurent Navoret
SMAI Journal of Computational Mathematics, 2022.
Link -
Complexes++: Efficient and versatile coarse-grained simulations of protein complexes and their dense solutions
Max Linke, Patrick Quoika, Bérenger Bramas, Juergen Koefinger, Gerhard Hummer
The Journal of Chemical Physics (JCP), 2022.
Link -
Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale: the TEXTAROSSA Approach
Microprocessors and Microsystems, 2022.
Link -
An Efficient Particle Tracking Algorithm for Large-Scale Parallel Pseudo-Spectral Simulations of Turbulence
Cristian C. Lalescu, Bérenger Bramas, Markus Rampp, Michael Wilczek
Computer Physics Communications, 2022.
Link -
Automated prioritizing heuristics for parallel task graph scheduling in heterogeneous computing
Clément Flint, Ludovic Paillat, Bérenger Bramas
PeerJ CS, 2021.
Link -
A fast vectorized sorting implementation based on the ARM scalable vector extension (SVE)
Bérenger Bramas
PeerJ CS, 2021.
Link -
Shape- and scale-dependent coupling between spheroids and velocity gradients in turbulence
Nimish Pujara, José-AgustÃn Arguedas-Leiva, Cristian C. Lalescu, Bérenger Bramas, Michael Wilczek
Journal of Fluid Mechanics, 2021.
Link -
TBFMM: A C++ generic and parallel fast multipole method library
Bérenger Bramas
The Journal of Open Source Software, 2020.
Link -
An integral equation formulation of the n-body dielectric spheres problem. Part II: Complexity analysis
Bérenger Bramas, Muhammad Hassan, Benjamin Stamm
ESAIM: Mathematical Modelling and Numerical Analysis, 2020.
Link -
Optimization of a discontinuous finite element solver with OpenCL and StarPU
Bérenger Bramas, Philippe Helluy, Laura Mendoza, Bruno Weber
IJFV International Journal On Finite Volumes, 2020.
Link -
Improving parallel executions by increasing task granularity in task-based runtime systems using acyclic DAG clustering
Bérenger Bramas, Alain Ketterlin
PeerJ Computer Science, 2019.
Link -
Impact study of data locality on task-based applications through the Heteroprio scheduler
Bérenger Bramas
PeerJ Computer Science, 2019.
Link -
Increasing the Degree of Parallelism Using Speculative Execution in Task-based Runtime Systems
Bérenger Bramas
PeerJ Computer Science, 2019.
Link -
Computing the Sparse Matrix Vector Product using Block-Based Kernels Without Zero Padding on Processors with AVX-512 Instructions
Bérenger Bramas, Pavel Kus
PeerJ Computer Science, 2018.
Link -
A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake
Bérenger Bramas
International Journal of Advanced Computer Science and Applications (IJACSA), Volume 8, Issue 10, 2017.
Link -
Inastemp: A Novel Intrinsics-as-Template Library for Portable SIMD-Vectorization
Bérenger Bramas
Scientific Programming Journal, 2017.
Link -
Bridging the gap between OpenMP and task-based runtime systems for the fast multipole method
Emmanuel Agullo, Olivier Aumage, Bérenger Bramas, Olivier Coulaud, Samuel Pitoiset
IEEE Transactions on Parallel and Distributed Systems, 2017.
Link -
Task-based FMM for heterogeneous architectures
Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Eric Darve, Matthias Messner, Toru Takahashi
Concurrency and Computation: Practice and Experience, 2016.
Link -
Time-Domain BEM for the Wave Equation on Distributed-Heterogeneous Architectures: A Blocking Approach
Bérenger Bramas, Olivier Coulaud, Guillaume Sylvand
Parallel Computing, 2015.
Link -
Task-based FMM for Multicore Architectures
Emmanuel Agullo, Bérenger Bramas, Olivier Coulaud, Eric Darve, Matthias Messner, Toru Takahashi
SIAM Journal on Scientific Computing (SISC), 2013.
Link
Conferences with Proceedings
-
Efficient GPU Implementation of Particle Interactions with Cutoff Radius and Few Particles per Cell
David Algis, Bérenger Bramas, Emmanuelle Darles, Lilian Aveneau
International Symposium on Parallel Computing and Distributed Systems (PCDS2024), IEEE, Sep 2024, Singapore, Singapore.
Preprint -
Dynamic Tasks Scheduler with Multiple Priority-based Trees on Heterogeneous Computing Systems
Hayfa Tayeb, Bérenger Bramas, Abdou Guermouche, Mathieu Faverge
IEEE Heterogeneity in Computing Workshop (HCW'24), IPDPS 2024
Link -
GPU Code Generation of Cardiac Electrophysiology Simulation with MLIR
Tiago Trevisan Jost, Arun Thangamani, Raphaël Colin, Vincent Loechner, Stéphane Genaud, and Bérenger Bramas
EuroPar, 2023.
Link -
Lifting Code Generation of Cardiac Physiology Simulation to Novel Compiler Technology
Arun Thangamani, Tiago Trevisan Jost, Vincent Loechner, Stéphane Genaud, and Bérenger Bramas
ACM/IEEE International Symposium on Code Generation and Optimization (CGO23).
Link -
TEXTAROSSA: Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale
Agosta, G., Cattaneo, D., Fornaciari, W., Galimberti, A., Massari, G., Reghenzani, F., ... & Ammendola, R.
24th Euromicro Conference on Digital System Design (DSD) IEEE, 2021
Link -
Time-domain BEM for the Wave Equation: Optimization and Hybrid Parallelization
Berenger Bramas, Olivier Coulaud, Guillaume Sylvand
Euro-Par 2014 Parallel Processing, Springer International Publishing, 2014, pp. 511–523
Link -
Design of a Sound System to Increase Emotional Expression Impact in Human-Robot Interaction
Berenger Bramas, Young-Min Kim, and Dong-Soo Kwon
International Conference on Control, Automation and Systems 2008, Oct. 14-17, 2008 in COEX, Seoul, Korea, pp. 2732-2737.
Link
Conferences & Workshops
-
Extending the Task Dataflow Model with Speculative DataAccesses
Anastasios Souris, Bérenger Bramas, Philippe Clauss
Compas 2023.
Link -
MulTreePrio: Scheduling task-based applications for heterogeneous computing systems
Hayfa Tayeb, Bérenger Bramas, Abdou Guermouche, Mathieu Faverge
Compas 2022.
Link -
Parallelization of the Lattice-Boltzmann schemes using the task-based method
Clément Flint, Berenger Bramas, Stéphane Genaud, Philippe Helluy
Compas 2022.
Link -
Automatic task-based parallelization of C++ applications by source-to-source transformations
Garip Kusoglu, Bérenger Bramas, Stéphane Genaud
Compas 2020.
Link -
SPETABARU: A Task-based Runtime System with Speculative Execution Capability
Berenger Bramas
SIAM CSE 2019.
Link -
Limitations of OpenMP task-based parallelization to achieve high performance and create a robust software design
Berenger Bramas
PMAA18, 10th International Workshop on Parallel Matrix Algorithms and Applications, 2018. -
ScalFMM: a Generic Parallel Fast Multipole Library Pierre Blanchard, Berenger Bramas, Olivier Coulaud, Eric F. Darve, Laurent Dupuy, Arnaud Etcheverry, Guillaume Sylvand
SIAM CSE 2015.
Link -
Hierarchical Randomized Low-Rank Approximations: Applications to covariance kernel matrices and generation of Gaussian Random Fields
Pierre Blanchard, Olivier Coulaud, Eric Darve, Berenger Bramas
SIAM Conference on Applied Linear Algebra (SIAM LA) 2015. -
Task-Based Parallelization of the Fast Multipole Method on Nvidia GPUs and Multicore Processors
Eric F. Darve, Emmanuel Agullo, Berenger Bramas, Olivier Coulaud, Matthias Messner, Toru Takahashi
SIAM CSE 2015
Link -
New Computational Ordering to Reach High Performance in the Time-domain BEM for the Wave Equation
Berenger Bramas, Olivier Coulaud, Guillaume Sylvand
Sparse Days 2014
Link -
Task-based Parallelization of the Fast Multipole Method on NVIDIA GPUs and Multicore Processors Emmanuel Agullo, Berenger Bramas, Olivier Coulaud, Matthias Messner, Eric Darve, Toru Takahashi
GTC 2013.
Link -
Pipelining the Fast Multipole Method over a Runtime System
Emmanuel Agullo, Berenger Bramas, Olivier Coulaud, Eric F. Darve, Matthias Messner, Toru Takahashi
SIAM CSE 2013.
Link
Research Reports (HAL)
-
On the improvement of the in-place merge algorithm parallelization
Bérenger Bramas, Quentin Bramas, 2020. Link -
Modeling Irregular Kernels of Task-based codes: Illustration with the Fast Multipole Method
Emmanuel Agullo, Berenger Bramas, Olivier Coulaud, Samuel Thibault, Luka Stanisic.
Link -
Task-based fast multipole method for clusters of multicore processors
Emmanuel Agullo, Olivier Coulaud, Martin Khannouz, Luka Stanisic, Berenger Bramas, 2016.
Link -
Optimized M2L Kernels for the Chebyshev Interpolation based Fast Multipole Method
Matthias Messner, Berenger Bramas, Olivier Coulaud, Eric Darve, 2012.
Link
Posters
- Poster: Matrices over Runtime Systems at Exascale
Agullo, Emmanuel, et al. High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, IEEE, 2012. Link
External Lists
You can find more about some of my past and current research at:
- HAL Publications
- Google Scholar Profile
- ORCID Profile
Projects
Ongoing:
- AUTOSPEC - ANR JCJC - 2021/2025 - Coordinator
The AUTOSPEC project aims to create methods for automatic task-based parallelization and to improve this paradigm by increasing the degree of parallelism using speculative execution. The project will focus on source-to-source transformations for automatic parallelization, speculative execution models, DAG scheduling, and the activation mechanisms for speculative execution. With this aim, the project will rely on a source-to-source compiler that targets the C++ language, a runtime system with speculative execution capabilities, and an editor (IDE) to enable compiler-guided development. The outcomes from the project will be open-source with the objective of developing a user community. The benefits will be of great interest both for developers who want to use an automatic parallelization method, but also for high-performance programming experts who will benefit from improvements of the task-based programming. The results of this project will be validated in various applications such as a protein complexes simulation software, and widely used open-source software. The aim will be to cover a wide range of applications to demonstrate the potential of the methods derived from this project while trying to establish their limitations to open up new research perspectives.
Past:
-
TEXTAROSSA - EuroHPC - 2021/2024 - Head of WP4
This European project aims at achieving a broad impact on the High Performance Computing (HPC) field both in pre-exascale and exascale scenarios. The TEXTAROSSA consortium will develop new hardware accelerators, innovative two-phase cooling equipment, advanced algorithms, methods and software products for traditional HPC domains as well as for emerging domains in High Performance Artificial Intelligence (HPC-AI) and High Performance Data Analytics (HPDA). We will focus on the scheduling of task-graphs under energy constraints and on porting scientific codes on heterogeneous computing nodes with FPGAs. -
MICROCARD - EuroHPC - 2021/2024 - Member
Numerical models of cardiac electrophysiology need to move from a continuum approach to a cell-by-cell approach to match observations in aging and diseased hearts. Exascale computers will be needed to run such models. The application is co-designed by HPC experts, numerical scientists, biomedical engineers, and biomedical scientists, from academia and industry. We developed, in concert, numerical schemes suitable for exascale parallelism, problem-tailored linear-system solvers and preconditioners, and a compiler to translate high-level model descriptions into optimized, energy-efficient system code for heterogeneous computing systems. -
TExas - Inria exploratory project - 2021/2023 - Coordinator
The TEXAS project aims to optimize the performance of supercomputers, particularly Exascale machines, which are capable of executing a billion billion operations per second. The objectives of the Texas project include: (1) Developing new programming models for parallel computing to enhance the efficiency of supercomputers. (2) Creating dynamic algorithms for task management, allowing supercomputers to adjust their computing power during the execution of high-performance applications (HPC), like numerical simulations. (3) Reducing the high energy consumption of supercomputers, which is a significant scientific and ecological issue.
Software
Main/only contributor:
-
Main software
- TBFMM: Lightweight task-based FMM using the block tree and OpenMP. https://gitlab.inria.fr/bramas/tbfmm
- SPECX: A task-based runtime system with speculative execution capability. Now supports GPUs and MPI. https://gitlab.inria.fr/bramas/specx
(It was previously known as SPETABARU https://gitlab.inria.fr/bramas/spetabaru) - APAC (not yet public): Source-to-source compiler for automatic parallelization using the task-based method.
- INASTEMP: A vectorization library that manages conditional statements (supports X86 extensions and ARM SVE). https://gitlab.inria.fr/bramas/inastemp
- SCALFMM: A generic fast multipole library. https://gitlab.inria.fr/solverstack/ScalFMM
- SPC5: A sparse matrix vector product library for AVX 512 and ARM SVE. https://gitlab.inria.fr/bramas/spc5
- AVX512 SORT: A fast sorting algorithm for AVX 512. https://gitlab.inria.fr/bramas/avx-512-sort
- AUTOVESK: Automatic vectorization of static kernels with a greedy algorithm. https://gitlab.inria.fr/bramas/autovesk
- Scanet (not public): A draft of a DNN lib (to be used in future projects).
-
PoC
- Particle interaction on GPU. https://gitlab.inria.fr/bramas/gpu-particle-interaction
- clsimple: A simple command line arg manager. https://gitlab.inria.fr/bramas/clsimple
- Farm-SVE: Naive/scalar implementation of the ARM C language extensions (ACLE) for the ARM Scalable Vector Extension (SVE) in standard C++. https://gitlab.inria.fr/bramas/farm-sve
- DAGPAR: Direct acyclic graph clustering library. https://gitlab.inria.fr/bramas/dagpar
- INPLACE MERGE: Some improvements to the inplace merge algorithm. https://gitlab.inria.fr/bramas/inplace-merge
- CUDA Tests & Bench: Some tests with CUDA to compare algorithms/implementations. https://gitlab.inria.fr/bramas/CUDA-Benchs-AND-Tests
-
Outdated
- SPARSE-TD (not public): Solver for the time domain boundary element method (BEM) for the wave equation.
- FMM-TD (not public): FMM solver for the time domain boundary element method (BEM) for the wave equation.
Contribute(d) to:
- COMPLEXES-PP: Coarse-grained simulations of biomolecular complexes. https://github.com/bio-phys/complexespp
- STARPU: A task-based runtime system. https://starpu.gitlabpages.inria.fr/
- TURTLE (not public yet): Pseudospectral direct numerical simulations (DNS) of the incompressible Navier-Stokes equations. https://gitlab.mpcdf.mpg.de/TurTLE/turtle
- CHUKRUT: Conservative Hyperbolic Upwind Kinetic Resolution of Unstructured Tokamaks. https://gitlab.math.unistra.fr/tonus/chukrut/chukrut
- SCHNAPS: Discontinuous finite element solver with OpenCL and StarPU. http://schnaps.gforge.inria.fr/
- PASTIX: Parallel Sparse direct Solver (my modifications are currently not public.) https://gitlab.inria.fr/solverstack/pastix
Supervision
Most of these persons are also co-advised by one or two other colleagues.
Postdoc:
- Marek Felsoci (2023-2024, Autospec project)
- Jean Etienne Ndamlabin Mboula (2022-2024, Texas project)
PhDs:
- Atoli Huppé (2024-2027, Numpex Exasoft)
- Antoine Gicquel (2023-2026, ANR)
- David Algis (2022-2025, Cifre)
- Anastasios Souris (2022-2024, 2 years only, Autospec project, left without notice, contract ended by Inria)
- Hayfa Tayeb (2021-2024, Textarossa project)
- Garip Kusoglu (2021-2022, 1 year only, Autospec project, moved to the industry)
- Clément Flint (2020-2024, Idex IRMIA)
Engineer:
- Paul Cardosi (2020-2021, ADT SPETABARU)
Interns:
- Julien Gaupp (2023-2025)
- Atoli Huppé (2023)
- Mohamed Bouaziz (2023)
- Hayfa Tayeb (2021)
- Clément Flint (2020)
- Garip Kusoglu (2019-2021)
- Michel Tching (2022 and 2021)
- Ludovic Paillat (2021)
- David Nicolazo (2021)
Projects of MS students:
- 2024
- Ahmed Zaned
- Arhun Saday
- Oualid Rhechim
- Kalim Moussa
- Assalas Lakrouz
- Patrick De Montferrier
- Mohamad Ali Awada
- Mohammed Arezki Atmimou
- 2023
- Matthieu Freitag
- Quentin Gerling
- Pierre Bertholet
- Evann Regnault
- Filipe Augusto
- Iman Iraj Doost
- 2022
- Tom Hammer
- Mathieu Fessler
- Lucas Schmidt
- Karim Bouali
- Laurent Werey
- Arnaud Kientzler
- Stephen Foerster
- Ludovic Paillat
- 2021
- Nicolas Laforet
- Maxime Princelle
- Marline Vauchair
- Thomas Steinmetz
- Nadjib Belaribi
- Janos Falke
- 2020
- Jimmy Huynh
- Garip Kusoglu
- Thomas Millot
- Camille Millot
- Aurélien David
- 2019
- Clément Flint
Teaching
Present
- (since 2018) I am involved in the compilation course for the master degree in CS at the university of Strasbourg.
- (since 2019) I am in charge of the compilation & performance course for the master degree in scientific computing at the university of Strasbourg.
Past
Teaching at Enseirb/IPB from 2013 to 2015 (2 years x 70 hours)
- Algorithms and hierarchical data structures with Denis Lapoire
- C programming with Georges Eyrolles
- System programming with Brice Goglin