Cross-modal binding is the ability to merge two or more modal representations of the same entity into a single shared representation. This ability is one of the fundamental properties of any cognitive system operating in a complex environment. In order to adapt successfully to changes in a dynamic environment the binding mechanism has to be supplemented with cross-modal learning. But perhaps the most difficult task is the integration of both mechanisms into a cognitive system. Their role in such a system is two-fold: to bridge the semantic gap between modalities, and to mediate between the lower-level mechanisms for processing the sensory data, and the higher-level cognitive processes, such as motivation and planning.
In this master thesis, we present an approach to probabilistic merging of multi-modal information in cognitive systems. By this approach, we formulate a model of binding and cross-modal learning in Markov logic networks, and describe the principles of its integration into a cognitive architecture. We implement a prototype of the model and evaluate it with off-line experiments that simulate a cognitive architecture with three modalities. Based on our approach, we design, implement and integrate the belief layer -- a subsystem that bridges the semantic gap in a prototype cognitive system named George. George is an intelligent robot that is able to detect and recognise objects in its surroundings, and learn about their properties in a situated dialogue with a human tutor. Its main purpose is to validate various paradigms of interactive learning. To this end, we have developed and performed on-line experiments that evaluate the mechanisms of robot's behaviour. With these experiments, we were also able to test and evaluate our approach to merging multi-modal information as part of a functional cognitive system.
|