In this thesis, the emphasis was on the efficient execution of applications on Non-Uniform Memory Access (NUMA) architectures, where we explored memory bandwidth, latencies, processor and memory allocation. In the study, we utilized NSC and ARNES clusters. We used various libraries and software such as STREAM Triad, likwid, numactl, and hwloc for analysis. The research delved into the complexity of NUMA architecture and its impact on system performance. Based on the findings, insights and recommendations for optimization and enhancement of application efficiency were provided. An important contribution of this thesis is the thread distribution method across processor cores for memory-dependent applications like STREAM Triad.
|