in place ex-scan performance benchmarks#

This example benchmarks the scan exclusive sum in place performance for the different algorithms available in Shamrock

 9 import time
10
11 import matplotlib.pyplot as plt
12 import numpy as np
13
14 import shamrock
15
16 # If we use the shamrock executable to run this script instead of the python interpreter,
17 # we should not initialize the system as the shamrock executable needs to handle specific MPI logic
18 if not shamrock.sys.is_initialized():
19     shamrock.change_loglevel(1)
20     shamrock.sys.init("0:0")

Use shamrock documentation style for matplotlib

26 shamrock.matplotlib.set_shamrock_mpl_style()

Main benchmark functions

31 def benchmark_u32(N, nb_repeat=10):
32     times = []
33     for _ in range(nb_repeat):
34         buf = shamrock.backends.DeviceBuffer_u32()
35         buf.resize(N)
36         buf.fill(0)
37         times.append(shamrock.algs.benchmark_scan_exclusive_sum_in_place(buf, N))
38     return min(times), max(times), sum(times) / nb_repeat

Run the performance test for all parameters

43 def run_performance_sweep():
44     # Define parameter ranges
45     # logspace as array
46     particle_counts = np.logspace(2, 7, 20).astype(int).tolist()
47
48     # Initialize results matrix
49     results_u32 = []
50
51     print(f"Particle counts: {particle_counts}")
52
53     total_runs = len(particle_counts)
54     current_run = 0
55
56     for _, N in enumerate(particle_counts):
57         current_run += 1
58
59         print(
60             f"[{current_run:2d}/{total_runs}] Running N={N:5d}...",
61             end=" ",
62         )
63
64         start_time = time.time()
65         min_time, max_time, mean_time = benchmark_u32(N)
66         results_u32.append(min_time)
67         elapsed = time.time() - start_time
68
69         print(f"mean={mean_time:.3f}s (took {elapsed:.1f}s)")
70
71     return particle_counts, results_u32

List current implementation

impl_param(impl_name="decoupled_lookback_512", params="")

List all implementations available

[impl_param(impl_name="std_scan", params=""), impl_param(impl_name="std_scan_single_task_acpp", params=""), impl_param(impl_name="decoupled_lookback_512", params=""), impl_param(impl_name="acpp_alg", params="")]

Run the performance benchmarks for all implementations

 89 for impl in all_default_impls:
 90     shamrock.algs.set_impl_scan_exclusive_sum_in_place(impl.impl_name, impl.params)
 91
 92     print(f"Running ex-scan in place performance benchmarks for {impl}...")
 93
 94     # Run the performance sweep
 95     particle_counts, results_u32 = run_performance_sweep()
 96
 97     plt.plot(particle_counts, results_u32, "--.", label=impl.impl_name + " (u32)")
 98
 99
100 Nobj = np.array(particle_counts)
101 Time100M = Nobj / 1e8
102 plt.plot(particle_counts, Time100M, color="grey", linestyle="-", alpha=0.7, label="100M obj/sec")
103
104
105 plt.xlabel("Number of elements")
106 plt.ylabel("Time (s)")
107 plt.title("ex-scan in place performance benchmarks")
108
109 plt.xscale("log")
110 plt.yscale("log")
111
112 plt.grid(True)
113
114 plt.legend()
115 plt.show()
ex-scan in place performance benchmarks
Info: setting scan_exclusive_sum_in_place implementation to impl : std_scan          [tree][rank=0]
Running ex-scan in place performance benchmarks for impl_param(impl_name="std_scan", params="")...
Particle counts: [100, 183, 335, 615, 1128, 2069, 3792, 6951, 12742, 23357, 42813, 78475, 143844, 263665, 483293, 885866, 1623776, 2976351, 5455594, 10000000]
[ 1/20] Running N=  100... mean=0.000s (took 0.0s)
[ 2/20] Running N=  183... mean=0.000s (took 0.0s)
[ 3/20] Running N=  335... mean=0.000s (took 0.0s)
[ 4/20] Running N=  615... mean=0.000s (took 0.0s)
[ 5/20] Running N= 1128... mean=0.000s (took 0.0s)
[ 6/20] Running N= 2069... mean=0.000s (took 0.0s)
[ 7/20] Running N= 3792... mean=0.000s (took 0.0s)
[ 8/20] Running N= 6951... mean=0.000s (took 0.0s)
[ 9/20] Running N=12742... mean=0.000s (took 0.0s)
[10/20] Running N=23357... mean=0.000s (took 0.0s)
[11/20] Running N=42813... mean=0.000s (took 0.0s)
[12/20] Running N=78475... mean=0.000s (took 0.0s)
[13/20] Running N=143844... mean=0.000s (took 0.0s)
[14/20] Running N=263665... mean=0.000s (took 0.0s)
[15/20] Running N=483293... mean=0.001s (took 0.0s)
[16/20] Running N=885866... mean=0.001s (took 0.0s)
[17/20] Running N=1623776... mean=0.002s (took 0.0s)
[18/20] Running N=2976351... mean=0.004s (took 0.0s)
[19/20] Running N=5455594... mean=0.009s (took 0.1s)
[20/20] Running N=10000000... mean=0.015s (took 0.2s)
Info: setting scan_exclusive_sum_in_place implementation to impl : std_scan_single_task_acpp  [tree][rank=0]
Running ex-scan in place performance benchmarks for impl_param(impl_name="std_scan_single_task_acpp", params="")...
Particle counts: [100, 183, 335, 615, 1128, 2069, 3792, 6951, 12742, 23357, 42813, 78475, 143844, 263665, 483293, 885866, 1623776, 2976351, 5455594, 10000000]
[ 1/20] Running N=  100... mean=0.000s (took 0.0s)
[ 2/20] Running N=  183... mean=0.000s (took 0.0s)
[ 3/20] Running N=  335... mean=0.000s (took 0.0s)
[ 4/20] Running N=  615... mean=0.000s (took 0.0s)
[ 5/20] Running N= 1128... mean=0.000s (took 0.0s)
[ 6/20] Running N= 2069... mean=0.000s (took 0.0s)
[ 7/20] Running N= 3792... mean=0.000s (took 0.0s)
[ 8/20] Running N= 6951... mean=0.000s (took 0.0s)
[ 9/20] Running N=12742... mean=0.000s (took 0.0s)
[10/20] Running N=23357... mean=0.000s (took 0.0s)
[11/20] Running N=42813... mean=0.000s (took 0.0s)
[12/20] Running N=78475... mean=0.000s (took 0.0s)
[13/20] Running N=143844... mean=0.000s (took 0.0s)
[14/20] Running N=263665... mean=0.000s (took 0.0s)
[15/20] Running N=483293... mean=0.000s (took 0.0s)
[16/20] Running N=885866... mean=0.001s (took 0.0s)
[17/20] Running N=1623776... mean=0.001s (took 0.0s)
[18/20] Running N=2976351... mean=0.002s (took 0.0s)
[19/20] Running N=5455594... mean=0.003s (took 0.0s)
[20/20] Running N=10000000... mean=0.006s (took 0.1s)
Info: setting scan_exclusive_sum_in_place implementation to impl : decoupled_lookback_512  [tree][rank=0]
Running ex-scan in place performance benchmarks for impl_param(impl_name="decoupled_lookback_512", params="")...
Particle counts: [100, 183, 335, 615, 1128, 2069, 3792, 6951, 12742, 23357, 42813, 78475, 143844, 263665, 483293, 885866, 1623776, 2976351, 5455594, 10000000]
[ 1/20] Running N=  100... mean=0.000s (took 0.0s)
[ 2/20] Running N=  183... mean=0.000s (took 0.0s)
[ 3/20] Running N=  335... mean=0.000s (took 0.0s)
[ 4/20] Running N=  615... mean=0.000s (took 0.0s)
[ 5/20] Running N= 1128... mean=0.000s (took 0.0s)
[ 6/20] Running N= 2069... mean=0.000s (took 0.0s)
[ 7/20] Running N= 3792... mean=0.000s (took 0.0s)
[ 8/20] Running N= 6951... mean=0.000s (took 0.0s)
[ 9/20] Running N=12742... mean=0.000s (took 0.0s)
[10/20] Running N=23357... mean=0.000s (took 0.0s)
[11/20] Running N=42813... mean=0.000s (took 0.0s)
[12/20] Running N=78475... mean=0.000s (took 0.0s)
[13/20] Running N=143844... mean=0.000s (took 0.0s)
[14/20] Running N=263665... mean=0.001s (took 0.0s)
[15/20] Running N=483293... mean=0.001s (took 0.0s)
[16/20] Running N=885866... mean=0.002s (took 0.0s)
[17/20] Running N=1623776... mean=0.003s (took 0.0s)
[18/20] Running N=2976351... mean=0.005s (took 0.1s)
[19/20] Running N=5455594... mean=0.010s (took 0.1s)
[20/20] Running N=10000000... mean=0.018s (took 0.2s)
Info: setting scan_exclusive_sum_in_place implementation to impl : acpp_alg          [tree][rank=0]
Running ex-scan in place performance benchmarks for impl_param(impl_name="acpp_alg", params="")...
Particle counts: [100, 183, 335, 615, 1128, 2069, 3792, 6951, 12742, 23357, 42813, 78475, 143844, 263665, 483293, 885866, 1623776, 2976351, 5455594, 10000000]
[ 1/20] Running N=  100... mean=0.000s (took 0.0s)
[ 2/20] Running N=  183... mean=0.000s (took 0.0s)
[ 3/20] Running N=  335... mean=0.000s (took 0.0s)
[ 4/20] Running N=  615... mean=0.000s (took 0.0s)
[ 5/20] Running N= 1128... mean=0.000s (took 0.0s)
[ 6/20] Running N= 2069... mean=0.000s (took 0.0s)
[ 7/20] Running N= 3792... mean=0.000s (took 0.0s)
[ 8/20] Running N= 6951... mean=0.000s (took 0.0s)
[ 9/20] Running N=12742... mean=0.000s (took 0.0s)
[10/20] Running N=23357... mean=0.000s (took 0.0s)
[11/20] Running N=42813... mean=0.000s (took 0.0s)
[12/20] Running N=78475... mean=0.000s (took 0.0s)
[13/20] Running N=143844... mean=0.000s (took 0.0s)
[14/20] Running N=263665... mean=0.000s (took 0.0s)
[15/20] Running N=483293... mean=0.001s (took 0.0s)
[16/20] Running N=885866... mean=0.001s (took 0.0s)
[17/20] Running N=1623776... mean=0.002s (took 0.0s)
[18/20] Running N=2976351... mean=0.003s (took 0.0s)
[19/20] Running N=5455594... mean=0.008s (took 0.1s)
[20/20] Running N=10000000... mean=0.016s (took 0.2s)

Total running time of the script: (0 minutes 2.270 seconds)

Estimated memory usage: 303 MB

Gallery generated by Sphinx-Gallery