Introduction of robots into everyday life has proven to be very challenging. Human-like versatility and the ability to adapt to unstructured environments is still out of reach for contemporary robots. To overcome this problem, the research community took the inspiration from human learning: the robots could observe how people perform the desired tasks and later improve on their own, e.g. through practicing. In this thesis we present novel methods to facilitate learning by demonstration and autonomous improvement of learned skills. Firstly, based on an initial demonstration, the robot needs to construct appropriate models of the observed task. We propose models on both semantic as well as trajectory levels of representation. At the semantic level we show how learning of manipulation actions can be improved by taking into account object relations. We develop a probabilistic formulation that can be used to model real world data, which can contain large amounts of noise. On the other hand, for modelling actions at the trajectory level we develop an extension to the popular Dynamic Movement Primitive framework. We investigated how speed profiles of the trajectories can be parameterized in order to be able to adapt and transfer them across different tasks. In order for the robot to execute the knowledge that was extracted from demonstration, the so called correspondence problem needs to be solved. This means that the demonstrator’s motion needs to be adapted to the robot’s embodiment. In the case of humanoid robots, transfer of whole body movement from human to a humanoid robot is possible only if the balance of the humanoid robot can be preserved. We show that a mapping can be constructed based on task priority control, where motion transferred from the demonstrator does not affect the robot’s centre of gravity. This ensures stability of the motion transfer, but also affects the fidelity of reproduction. Therefore, the movement primitives obtained in this manner are further adapted with reinforcement learning, which is possible since the initial movement is stable. Finally, we deal with the issue of slow convergence of reinforcement learning algorithms. In many cases, additional information about the learning process is available, which cannot be exploited by conventional reinforcement learning. We developed a procedure which incorporates prior knowledge and iterative learning control for obtaining exploration policies in the early stages of learning. This leads to faster convergence. Since random exploration is used for final tuning of the policy, the convergence properties of the applied reinforcement learning update rule are retained. In this manner we significantly improve the learning performance. A series of experiments are described that evaluate the proposed learning algorithms. These include learning manipulation tasks in a kitchen environment, optimization of the speed profile of a liquid carrying motion, transfer of whole body movements to a humanoid robot, as well as learning of a classical via-point problem.
|