izpis_h1_title_alt

Priprava programov OpenCL za učinkovito izvajanje na različnih arhitekturah
ID ŠEMROV, JURE (Author), ID Lotrič, Uroš (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (1,04 MB)
MD5: 5DAB64A9040640783829DCFE109DD693
PID: 20.500.12556/rul/7d0c365f-caea-49a1-93e4-2e73f8bfed50

Abstract
V diplomski nalogi se posvečamo predvsem vprašanju, kako programe OpenCL napisati, da se bodo učinkovito izvajali na različnih arhitekturah. Težava, s katero se soočamo, so arhitekturne razlike med sistemi. Če torej želimo doseči maksimalno učinkovitost, moramo program ustrezno prilagoditi. Prilagoditve obsegajo število računskih enot, število niti v skupini, uporabo vektorske enote, lokalnega pomnilnika in predpomnilnikov ter še druge načine za prikrivanje latence. Na kratko, izkoristiti moramo morebitne arhitekturne prednosti naprave in paralelizem tako na nivoju ukazov, kot tudi na nivoju niti. V nalogi obravnavamo pet programov, to so histogram, množenje matrik, predponska vsota, problem n teles in bitonično urejanje. Te programe prilagodimo trem različnim sistemom, in sicer CPE Intel Core i5-2450M, mnogojedrnik Xeon Phi 5110P, GPE Nvidia Tesla K20. Da bi te prilagoditve izkusili tudi v praksi, smo izmerili čas izvajanja programov za različno velike skupine in skušali razbrati kaj se dogaja. Če naše ugotovitve posplošimo, lahko privzamemo, da naj bo število skupin vsaj toliko, kot je računskih enot, skupine pa naj bodo ravno prav velike, da zmanjšamo režijo preklopa skupin in pomnilniško latenco ter obenem ne povečamo režije zaradi komunikacije ali zmanjšamo števila skupin, ki se sočasno izvajajo na računski enoti. Za učinkovito izvajanje moramo na CPE in mnogojedrniku upoštevati predpomnilnike in širino vektorske enote, medtem ko moramo na GPE čim bolje izkoristiti visoko prepustnost ter prikriti latenco z velikim številom niti in lokalnim pomnilnikom.

Language:Slovenian
Keywords:OpenCL, heterogeni sistemi, računske enote, delovne skupine, niti, SIMD, SIMT, lokalni pomnilnik
Work type:Bachelor thesis/paper
Organization:FRI - Faculty of Computer and Information Science
Year:2017
PID:20.500.12556/RUL-88533 This link opens in a new window
Publication date in RUL:16.01.2017
Views:1506
Downloads:677
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Optimizing OpenCL programs for different hardware architectures
Abstract:
The main question in this thesis we will be trying to solve, is how to write a proper OpenCL program to effectively run on different architectures. A problem to overcome are the architectural differences between systems. To maximize the efficiency, we need to adapt the program. This defers by the number of compute units, number of threads in a work-group, use of a vector unit, local memory and cache to minimize latency. To summarize, we need to exploit both instruction and thread level parallelism as well as other architectural advantages. We used five programs, histogram, matrix multiply, prefix sum, n body problem and bitonic sort. Then we adapted them to three different systems, Intel Core i5-2450M CPU, Xeon Phi 5110P manycore processor and Tesla K20 GPU. To test these adaptations in practice, we measured program runtime for different work-group sizes and tried to explain what is going on. Our conclusions show, that we need at least as many work-groups as there are compute units. The work-group size have to be large enough to reduce the overhead of maintaining a work-group and hide memory latency. At the same time they should be small enough to reduce overhead of communication and to keep executing more work-groups simultaneously on each compute unit. To execute programs efficiently on a CPU and manycore processors, we need to take into account caches and wideness of a vector unit, while on a GPU we need to exploit high memory throughput and hide latency with large work-groups and local memory.

Keywords:OpenCL, heterogeneous systems, compute unit, work groups, work-items, SIMD, SIMT, local memory

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back