Shamrock 2025.10.0
Astrophysical Code
Loading...
Searching...
No Matches
Classes | Functions
shamsys Namespace Reference

namespace for the system handling More...

Classes

class  AuroraSystemMetricReporter
 
struct  DeviceSelectRet_t
 
struct  FormattedSystemMetrics
 
class  IntelRAPLSystemMetricReport
 
class  ISystemMetricReporter
 
class  NoopSystemMetricReporter
 
struct  SystemMetrics
 

Functions

void change_log_format ()
 Change the log formatter according to the SHAMLOGFORMATTER and SHAMLOG_ERR_ON_EXCEPT environment variables.
 
DeviceSelectRet_t select_devices (std::string sycl_cfg)
 Select the devices for the queues.
 
u32 for_each_device (std::function< void(u32, const sycl::platform &, const sycl::device &)> fct)
 Iterate over all SYCL devices and perform a given function.
 
void run_micro_benchmark ()
 Run latency & bandwidth benchmark those benchmark where adapted from osu_microbenchmark.
 
const std::unordered_map< std::string, double > & get_microbench_results ()
 Get the microbench results.
 
void shamrock_smi (bool list_all_devices)
 Print information about all available SYCL devices in the cluster.
 
void register_signals ()
 
void init_backtrace_utilities (bool enable_colors)
 Initialize the backtrace utilities.
 
std::string crash_report_backtrace ()
 Generate a backtrace for the crash report.
 
std::unique_ptr< ISystemMetricReporter > & current_reporter ()
 
std::optional< f64get_rank_energy_consummed ()
 
bool support_rank_energy_consummed ()
 
std::optional< f64get_gpu_energy_consummed ()
 
bool support_gpu_energy_consummed ()
 
std::optional< f64get_cpu_energy_consummed ()
 
bool support_cpu_energy_consummed ()
 
std::optional< f64get_dram_energy_consummed ()
 
bool support_dram_energy_consummed ()
 
SystemMetrics get_system_metrics (bool barrier=true)
 
std::vector< SystemMetricsgather_rank_metrics (const SystemMetrics &input)
 
SystemMetrics aggregate_rank_metrics (const std::vector< SystemMetrics > &input)
 
FormattedSystemMetrics format_system_metrics (const SystemMetrics &input)
 Only to be used on deltas, not the raw one.
 
SystemMetrics operator- (const SystemMetrics &lhs, const SystemMetrics &rhs)
 
bool has_reporter ()
 
void shamrock_smi_summary ()
 
void shamrock_smi_all ()
 Print SMI for all devices.
 
void shamrock_smi_selected (bool list_all_devices)
 Print SMI for selected devices.
 
std::unique_ptr< ISystemMetricReportermake_reporter (std::string_view reporter_name)
 
std::unique_ptr< ISystemMetricReportermake_reporter ()
 
void test_reporter (std::unique_ptr< ISystemMetricReporter > &reporter)
 test that there is no crashes
 

Detailed Description

namespace for the system handling

Function Documentation

◆ aggregate_rank_metrics()

SystemMetrics shamsys::aggregate_rank_metrics ( const std::vector< SystemMetrics > &  input)

Definition at line 258 of file system_metrics.cpp.

◆ change_log_format()

void shamsys::change_log_format ( )

Change the log formatter according to the SHAMLOGFORMATTER and SHAMLOG_ERR_ON_EXCEPT environment variables.

If SHAMLOGFORMATTER is 0, 1, 2, or 3, the log formatter will be changed to the corresponding style. If SHAMLOG_ERR_ON_EXCEPT is 1, an exception handler callback will be set to generate an error log when an exception is created.

Note
This function should be called before creating any logs.

Definition at line 279 of file change_log_format.cpp.

+ Here is the call graph for this function:

◆ crash_report_backtrace()

std::string shamsys::crash_report_backtrace ( )

Generate a backtrace for the crash report.

Returns
std::string The backtrace log (can include profiler stacktrace and true stacktrace)

Definition at line 72 of file stacktrace_log.cpp.

+ Here is the call graph for this function:

◆ current_reporter()

std::unique_ptr< ISystemMetricReporter > & shamsys::current_reporter ( )

Definition at line 191 of file system_metrics.cpp.

◆ for_each_device()

u32 shamsys::for_each_device ( std::function< void(u32, const sycl::platform &, const sycl::device &)>  fct)
inline

Iterate over all SYCL devices and perform a given function.

Parameters
fctThe function to be called for each device. The function takes 3 arguments:
  • The key of the device. This key is a unique identifier for the device.
  • The SYCL platform corresponding to the device.
  • The SYCL device.
Returns
The total number of devices found.

The function will be called in the order of the platforms and devices. The order of the platforms is determined by the SYCL implementation. The order of the devices is determined by the order of the platforms and the order of the devices within each platform.

Example usage:

auto fct = [&](u32 key, const sycl::platform &Platform, const sycl::device &Device) {
std::cout << "Platform: " << Platform.get_info<sycl::info::platform::name>() <<
std::endl; std::cout << "Device: " << Device.get_info<sycl::info::device::name>() <<
std::endl;
};
std::uint32_t u32
32 bit unsigned integer
u32 for_each_device(std::function< void(u32, const sycl::platform &, const sycl::device &)> fct)
Iterate over all SYCL devices and perform a given function.

Definition at line 50 of file for_each_device.hpp.

◆ format_system_metrics()

FormattedSystemMetrics shamsys::format_system_metrics ( const SystemMetrics input)

Only to be used on deltas, not the raw one.

Definition at line 295 of file system_metrics.cpp.

◆ gather_rank_metrics()

std::vector< SystemMetrics > shamsys::gather_rank_metrics ( const SystemMetrics input)

Definition at line 220 of file system_metrics.cpp.

◆ get_cpu_energy_consummed()

std::optional< f64 > shamsys::get_cpu_energy_consummed ( )
inline

Definition at line 62 of file system_metrics.hpp.

◆ get_dram_energy_consummed()

std::optional< f64 > shamsys::get_dram_energy_consummed ( )
inline

Definition at line 70 of file system_metrics.hpp.

◆ get_gpu_energy_consummed()

std::optional< f64 > shamsys::get_gpu_energy_consummed ( )
inline

Definition at line 54 of file system_metrics.hpp.

◆ get_microbench_results()

const std::unordered_map< std::string, double > & shamsys::get_microbench_results ( )

Get the microbench results.

Returns
const std::unordered_map<std::string, double> &

Definition at line 463 of file MicroBenchmark.cpp.

◆ get_rank_energy_consummed()

std::optional< f64 > shamsys::get_rank_energy_consummed ( )
inline

Definition at line 46 of file system_metrics.hpp.

◆ get_system_metrics()

SystemMetrics shamsys::get_system_metrics ( bool  barrier = true)

Definition at line 200 of file system_metrics.cpp.

◆ has_reporter()

bool shamsys::has_reporter ( )

Definition at line 147 of file system_metrics.cpp.

◆ init_backtrace_utilities()

void shamsys::init_backtrace_utilities ( bool  enable_colors)

Initialize the backtrace utilities.

Parameters
enable_colorsWhether to enable colors in the backtrace

Definition at line 52 of file stacktrace_log.cpp.

+ Here is the call graph for this function:

◆ make_reporter() [1/2]

std::unique_ptr< ISystemMetricReporter > shamsys::make_reporter ( )

Definition at line 176 of file system_metrics.cpp.

◆ make_reporter() [2/2]

std::unique_ptr< ISystemMetricReporter > shamsys::make_reporter ( std::string_view  reporter_name)

Definition at line 156 of file system_metrics.cpp.

◆ operator-()

SystemMetrics shamsys::operator- ( const SystemMetrics lhs,
const SystemMetrics rhs 
)
inline

Definition at line 107 of file system_metrics.hpp.

◆ register_signals()

void shamsys::register_signals ( )

Definition at line 118 of file SignalCatch.cpp.

◆ run_micro_benchmark()

void shamsys::run_micro_benchmark ( )

Run latency & bandwidth benchmark those benchmark where adapted from osu_microbenchmark.

  • osu-micro-benchmarks/mpi/pt2pt/osu_bw.c
  • osu-micro-benchmarks/mpi/pt2pt/osu_latency.c

Definition at line 77 of file MicroBenchmark.cpp.

+ Here is the call graph for this function:

◆ select_devices()

DeviceSelectRet_t shamsys::select_devices ( std::string  sycl_cfg)

Select the devices for the queues.

If the config string starts with "auto:", then the function init_queues_auto is called with the remaining string as argument. Otherwise, the config string is split at the first colon, and the integers on the left and right are used as arguments to the function init_queues.

Parameters
sycl_cfgthe config string
Returns
a DeviceSelectRet_t containing the selected devices

Definition at line 176 of file device_select.cpp.

+ Here is the call graph for this function:

◆ shamrock_smi()

void shamsys::shamrock_smi ( bool  list_all_devices)

Print information about all available SYCL devices in the cluster.

Definition at line 230 of file shamrock_smi.cpp.

◆ shamrock_smi_all()

void shamsys::shamrock_smi_all ( )

Print SMI for all devices.

Definition at line 109 of file shamrock_smi.cpp.

+ Here is the call graph for this function:

◆ shamrock_smi_selected()

void shamsys::shamrock_smi_selected ( bool  list_all_devices)

Print SMI for selected devices.

Definition at line 161 of file shamrock_smi.cpp.

+ Here is the call graph for this function:

◆ shamrock_smi_summary()

void shamsys::shamrock_smi_summary ( )

Definition at line 46 of file shamrock_smi.cpp.

◆ support_cpu_energy_consummed()

bool shamsys::support_cpu_energy_consummed ( )
inline

Definition at line 66 of file system_metrics.hpp.

◆ support_dram_energy_consummed()

bool shamsys::support_dram_energy_consummed ( )
inline

Definition at line 74 of file system_metrics.hpp.

◆ support_gpu_energy_consummed()

bool shamsys::support_gpu_energy_consummed ( )
inline

Definition at line 58 of file system_metrics.hpp.

◆ support_rank_energy_consummed()

bool shamsys::support_rank_energy_consummed ( )
inline

Definition at line 50 of file system_metrics.hpp.

◆ test_reporter()

void shamsys::test_reporter ( std::unique_ptr< ISystemMetricReporter > &  reporter)

test that there is no crashes

Definition at line 184 of file system_metrics.cpp.

+ Here is the call graph for this function: