Go to the source code of this file.
|
| namespace | sham |
| | namespace for backends this one is named only sham since shambackends is too long to write
|
| |
|
| template<class T > |
| void | sham::benchmarks::saxpy (u32 i, int n, T a, T *__restrict x, T *__restrict y) |
| | saxpy function for benchmarking.
|
| |
| template<class T > |
| saxpy_result | sham::benchmarks::saxpy_bench (DeviceScheduler_ptr sched, int N, T init_x, T init_y, T a, int load_size, bool check_correctness) |
| | saxpy function for benchmarking.
|
| |
◆ saxpy()
template<class T >
| void sham::benchmarks::saxpy |
( |
u32 |
i, |
|
|
int |
n, |
|
|
T |
a, |
|
|
T *__restrict |
x, |
|
|
T *__restrict |
y |
|
) |
| |
|
inline |
saxpy function for benchmarking.
- Parameters
-
| [in] | i | Index to start the computation. |
| [in] | n | Number of elements to process. |
| [in] | a | Coefficient in the saxpy operation. |
| [in] | x | Input array. |
| [in,out] | y | Output array. |
Definition at line 35 of file saxpy.hpp.
◆ saxpy_bench()
template<class T >
| saxpy_result sham::benchmarks::saxpy_bench |
( |
DeviceScheduler_ptr |
sched, |
|
|
int |
N, |
|
|
T |
init_x, |
|
|
T |
init_y, |
|
|
T |
a, |
|
|
int |
load_size, |
|
|
bool |
check_correctness |
|
) |
| |
|
inline |
saxpy function for benchmarking.
- Parameters
-
| [in] | sched | Device scheduler. |
| [in] | N | Number of elements to process. |
| [in] | init_x | Initial value for the input array. |
| [in] | init_y | Initial value for the output array. |
| [in] | a | Coefficient in the saxpy operation. |
| [in] | load_size | Number of bytes processed per element. |
| [in] | check_correctness | Check if the result is correct. |
From https://developer.nvidia.com/blog/how-implement-performance-metrics-cuda-cc/
- Returns
- saxpy_result containing the computation time in seconds, the bandwidth in gibibytes per second, and the name of the function.
Definition at line 70 of file saxpy.hpp.