izpis_h1_title_alt

Spodbujevalno učenje pri igranju namiznih iger : delo diplomskega seminarja
ID Kalan, Tim (Author), ID Knez, Marjetka (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (1,30 MB)
MD5: 3F08D98164E48D152C54F47C900FA3BE

Abstract
Motivacija za nalogo je bila razumeti algoritme, ki se učijo prek poskušanja in napak. Na začetku postavimo teoretični okvir v obliki Markovskih procesov odločanja. V nadaljevanju se posvetimo izpeljavi in opisu metod, ki temeljijo na konceptu dinamičnega programiranja. Te metode potem posplošimo in predstavimo tri glavne iterativne algoritme: Monte Carlo, TD(0) in TD($\lambda$). Ker pa smo želeli ustvariti kompetentnega igralca namiznih iger, te pa imajo pogosto veliko količino stanj, se posvetimo še funkcijski aproksimaciji in kombinaciji nevronskih mrež s predstavljenimi algoritmi. V drugem delu naloge si bolj natančno ogledamo kombinatorne igre; to je teoretični model za namizne igre. Nato opišemo nekaj pomembnih razlik, do katerih pride pri spodbujevalnem učenju v tem konteksu in si ogledamo, kako se prilagodi koncept optimalne strategije in vrednostne funkcije. V zadnjem delu apliciramo teorijo še na praktičnem primeru. Na $m, n, k$-igrah uporabimo opisane algoritme in komentiramo njihovo učinkovitost.

Language:Slovenian
Keywords:spodbujevalno učenje, Markovski proces odločanja, učenje s časovno razliko, po-stanja, samoigra
Work type:Bachelor thesis/paper
Organization:FMF - Faculty of Mathematics and Physics
Year:2021
PID:20.500.12556/RUL-134977 This link opens in a new window
UDC:519.2
COBISS.SI-ID:97683459 This link opens in a new window
Publication date in RUL:16.02.2022
Views:1163
Downloads:81
Metadata:XML RDF-CHPDL DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Reinforcement learning in board games
Abstract:
The motivation for this work is trying to understand algorithms that learn through trial and error. At the beginning we set the theoretical foundation by examining Markov decision processes. We then derive and describe methods, which are based on dynamic programming. Further we generalize these methods and present three iterative algorithms: Monte Carlo, TD(0) and TD($\lambda$). Since we want to create a competent board game player, and board games often have a large number of states, we observe also the function approximation and combine neural networks with the described algorithms. In the second part we examine combinatorial games in more detail. This is our theoretical model for board games. We then describe some important differences which have to be made to reinforcement learning in this context and look at how to adjust the concept of optimal strategies and value functions. In the last part we apply the presented theory to a practical example. We use the described algorithms to solve some $m, n, k$-games and comment on their efficiency.

Keywords:reinforcement learning, Markov decision process, temporal-difference learning, afterstates, self-play

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back