The cart-pole problem is often used to test the performance of controllers. Multiple variants exist and they present a series of challenges, however, their simplicity fails to capture some realistic settings. Therefore, in the thesis, we suggest two extensions of the problem, namely merging the original one with the mountain car problem and an environment where the cart must move under an obstacle. We derive the equations of the cart-pole dynamics on an uneven surface and present our implementation. We also describe the challenges the extensions introduce and survey similar work. Since the literature does not offer a standard approach, we combine the existing ideas into a system which offers well-defined starting points for further work. We teach an agent on two known variants of the problem and our two extensions using deep reinforcement learning and present the experimental results. These show the severely more demanding nature of the extensions and expose their challenges, which pose difficulties to reinforcement learning. While learning on the standard tasks is relatively fast and stable, it is much slower and less reliable on the extensions. The risky behaviour demanded by the task often leads to strategies which do not accomplish the task but offer a safer way of balancing the pole. Nevertheless, we find anticipated and intelligent patterns in the agent's behaviour. The work also exposes many possibilities for further work.
|