The aim of this thesis is to explore and adapt computer vision algorithms to human environment perception in order to properly evaluate the algorithms in resemblance to human reasoning and improve their performance. The thesis presents a complete methodology for either adapting the output of a pre-trained object classification algorithm to the target-user population or inferring a proper, user-friendly categorization from the target-user population. The results of the experiments on the well-known datasets have shown that the target-user population preferred such a transformed categorization by a large margin, that the performance of human observers is probably better than previously thought, and that the outcome of re-targeting may be difficult to predict without actual tests on the target-user population. Despite their powerful discriminative abilities, discriminatively trained Convolutional Neural Networks (CNNs) lack the properties of generative models. This leads to a decreased performance in human environments where objects are poorly visible. This work proposes the Human-Centered Deep Compositional (HCDC) model that combines low-level visual discrimination of a CNN and the high-level reasoning of the Hierarchical Compositional Model (HCM). Defined as a transparent model, it can be optimized to real-world environments by adding compactly encoded domain knowledge from human studies and physical laws. The experimental results on new FridgeNet datasets and a mixture of publicly available datasets show the proposed model is explainable, has higher discriminative and generative power, and better handles the occlusion than Mask-RCNN in instance segmentation tasks. This thesis makes the following scientific contributions to the area of object recognition and detection tasks: (i) Methodology for building image datasets and evaluating computer vision algorithms with consideration of the target-user population; (ii) A novel deep compositional model for automatic object detection in visual data captured in unstructured environments.
|