The motivation for this work is trying to understand algorithms that learn through trial and error. At the beginning we set the theoretical foundation by examining Markov decision processes. We then derive and describe methods, which are based on dynamic programming. Further we generalize these methods and present three iterative algorithms: Monte Carlo, TD(0) and TD($\lambda$). Since we want to create a competent board game player, and board games often have a large number of states, we observe also the function approximation and combine neural networks with the described algorithms.
In the second part we examine combinatorial games in more detail. This is our theoretical model for board games. We then describe some important differences which have to be made to reinforcement learning in this context and look at how to adjust the concept of optimal strategies and value functions.
In the last part we apply the presented theory to a practical example. We use the described algorithms to solve some $m, n, k$-games and comment on their efficiency.
|