Port of Argonne National Laboratory's FMA chains benchmark flops.cpp.
More...
Go to the source code of this file.
|
| namespace | sham |
| | namespace for backends this one is named only sham since shambackends is too long to write
|
| |
Port of Argonne National Laboratory's FMA chains benchmark flops.cpp.
- Author
- Timothée David–Cléris (tim.s.nosp@m.hamr.nosp@m.ock@p.nosp@m.roto.nosp@m.n.me)
Definition in file fma_chains.hpp.
◆ MAD_16
Value: MAD_4(x, y); \
MAD_4(x, y); \
MAD_4(x, y); \
MAD_4(x, y);
◆ MAD_4
Value: x = y * x + y; \
y = x * y + x; \
x = y * x + y; \
y = x * y + x;
◆ fma_chains()
template<class T >
| void sham::benchmarks::fma_chains |
( |
u32 |
i, |
|
|
int |
nrotation, |
|
|
T |
y0, |
|
|
T *__restrict |
in, |
|
|
T *__restrict |
out |
|
) |
| |
|
inline |
Kernel for the fma_chains benchmark.
Saturates the FPU to hide memory latency. Since we know that there are 16 * 2 flops per iteration, this kernel can be used to compute the achieved flops.
- Template Parameters
-
| T | value type of the input and output vectors |
- Parameters
-
| i | index of the element to process |
| nrotation | number of FMA-chain rotations to apply |
| y0 | initial value of the second input vector |
| in | input vector |
| out | output vector |
Definition at line 41 of file fma_chains.hpp.
◆ fma_chains_bench()
template<class T >
| fma_chains_result sham::benchmarks::fma_chains_bench |
( |
DeviceScheduler_ptr |
sched, |
|
|
int |
N, |
|
|
f64 |
time_threshold |
|
) |
| |
|
inline |