Shamrock 2025.10.0
Astrophysical Code
Loading...
Searching...
No Matches
Classes | Namespaces | Macros | Functions
fma_chains.hpp File Reference

Port of Argonne National Laboratory's FMA chains benchmark flops.cpp. More...

#include "shambase/assert.hpp"
#include "shambase/time.hpp"
#include "shambackends/DeviceBuffer.hpp"
#include "shambackends/DeviceScheduler.hpp"
#include "shambackends/math.hpp"
+ Include dependency graph for fma_chains.hpp:
+ This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  sham::benchmarks::fma_chains_result
 Structure containing the results of an fma_chains benchmark. More...
 

Namespaces

namespace  sham
 namespace for backends this one is named only sham since shambackends is too long to write
 

Macros

#define MAD_4(x, y)
 
#define MAD_16(x, y)
 

Functions

template<class T >
void sham::benchmarks::fma_chains (u32 i, int nrotation, T y0, T *__restrict in, T *__restrict out)
 Kernel for the fma_chains benchmark.
 
template<class T >
fma_chains_result sham::benchmarks::fma_chains_bench (DeviceScheduler_ptr sched, int N, f64 time_threshold)
 Run the fma_chains benchmark.
 

Detailed Description

Port of Argonne National Laboratory's FMA chains benchmark flops.cpp.

Author
Timothée David–Cléris (tim.s.nosp@m.hamr.nosp@m.ock@p.nosp@m.roto.nosp@m.n.me)

Definition in file fma_chains.hpp.

Macro Definition Documentation

◆ MAD_16

#define MAD_16 (   x,
 
)
Value:
MAD_4(x, y); \
MAD_4(x, y); \
MAD_4(x, y); \
MAD_4(x, y);

◆ MAD_4

#define MAD_4 (   x,
 
)
Value:
x = y * x + y; \
y = x * y + x; \
x = y * x + y; \
y = x * y + x;

Function Documentation

◆ fma_chains()

template<class T >
void sham::benchmarks::fma_chains ( u32  i,
int  nrotation,
y0,
T *__restrict  in,
T *__restrict  out 
)
inline

Kernel for the fma_chains benchmark.

Saturates the FPU to hide memory latency. Since we know that there are 16 * 2 flops per iteration, this kernel can be used to compute the achieved flops.

Template Parameters
Tvalue type of the input and output vectors
Parameters
iindex of the element to process
nrotationnumber of FMA-chain rotations to apply
y0initial value of the second input vector
ininput vector
outoutput vector

Definition at line 41 of file fma_chains.hpp.

+ Here is the call graph for this function:

◆ fma_chains_bench()

template<class T >
fma_chains_result sham::benchmarks::fma_chains_bench ( DeviceScheduler_ptr  sched,
int  N,
f64  time_threshold 
)
inline

Run the fma_chains benchmark.

Based on Argonne's Aurora node performance overview: https://docs.alcf.anl.gov/aurora/node-performance-overview/node-performance-overview/

Template Parameters
Tvalue type used in the benchmark
Parameters
schedscheduler for the target device
Nnumber of elements (independent FMA chains) to process
time_thresholdminimum wall-clock time to run the benchmark in seconds
Returns
benchmark results as an fma_chains_result

Definition at line 85 of file fma_chains.hpp.

+ Here is the call graph for this function: