In this master's thesis, we discuss modern closely coupled heterogeneous architectures. We focused on the GH200 system and its advantages over standard CPU-GPU heterogeneous systems in scientific simulations. For the analysis, we selected three simulations in which data transfers can present a bottleneck. For each simulation, we implemented three versions with different types of memory allocation. We tested the simulations on two hardware configurations on the compute nodes of the FRIDA cluster. The first node contained the GH200 system, and the second contained a combination of a CPU and the H100 graphics processing unit connected via fifth-generation PCIe. Our experimental results showed that under high data-transfer workloads, the GH200 configuration achieved a speedup of up to 20 times when using pageable memory, 5.5 times with pinned memory, and 3.5 times with unified memory in comparison with the CPU-H100 configuration. These findings highlight GH200’s ability to mitigate data-movement overheads and show its suitability for data-intensive scientific applications.
|