This thesis addresses the problem of learning an optimal strategy for playing the dice game Yamb using reinforcement learning methods. The main motivation for studying this game was to provide the model with only the game rules without any strategic guidance, allowing it to develop an optimal strategy completely independently. Jamb is a social dice game that requires a combination of tactical dice-rolling decisions and long-term strategic planning.
To solve this problem, we implemented and compared three algorithms: Deep Q-Network (DQN), Proximal Policy Optimization (PPO), and AlphaZero. We simplified the approach by automatically computing optimal dice rolls, allowing agents to focus on field selection. We also tested different approaches to reward shaping.
The most successful approach was PPO with shaped rewards, achieving average results of 865 points and developing strategies comparable to human play (882 points). The model demonstrated the ability for long-term planning and developed sophisticated strategies for maximizing the final score. For AlphaZero, there remains significant room for improvement, as the end-of-game reward structure and computational complexity of the algorithm prevented us from achieving expected results.
|