This thesis covers reinforcement learning in the simulation of humanoid robot walking using the proximal policy optimization algorithm (PPO). First, the thesis gives description of the dynamical model of the robot, simulated in the Unity development platform, where simulations of reinforcement learning using the ML-Agents package are performed, the agent, its input and output states, and the reward function. The reference animations that the robot mimics in this research are part of a publicly available data set that includes the time data of movement of individual segments of several different walking patterns, data of masses and heights of recorded subjects and finally walking speed and ground reaction force (GRF) and the center of pressure (CoP) data, which are exported and analyzed in the MATLAB environment and compared with those of the learned robot model.
Early termination and Reference state initialization functionalities are presented, which accelerate learning and help with learning and walking stability. The final and most extensive chapter presents the entire process of building the robot model and gait imitation from the first to the last version. The aim of this chapter is to explain the thought process in model improvements and to show the achieved imitation of gait, CoP and GRF.
|