This thesis investigates reinforcement learning for training autonomous agents in a custom 2D side-scrolling game that extends Flappy Bird with items, an inventory system, bullets, and enemy agents. We employ proximal policy optimization (PPO), reviewing its theoretical foundations and justifying its selection for dynamic, fast-paced gameplay environments. The practical part documents environment modelling, observation and action design, reward shaping, network architecture, and hyperparameter choices, together with a curriculum that incrementally introduces more demanding tasks for the player agent. Experiments demonstrate overall stable learning and meaningful in-game behaviour while highlighting the challenges of continual, multi-objective training in fast real-time games.
|