In this diploma thesis, we address the problem of choosing the most appropriate group of deep learning object detection models for use on photographs, which are typical for tourist accommodations. As a representative dataset for this domain of photographs, we used photographs from online listings. We focused on two popular Slovenian destinations, Kranjska Gora and Piran. We used several pre-trained models, which return bounding boxes for detected objects. Out of eight models in total, six object detection models were trained on COCO dataset, and two models were trained on Open Images dataset. We compared the models according to their performance on photographs from tourist accommodation online advertisements. We also tried evaluating their performance on part of the test dataset from Open Images. Additionally, we present a new way of using existing object detection models, i.e., merging multiple models into a combined system. As a final solution to our given problem, we propose a merged system of models R-FCN and YOLOv3, which were trained on the COCO dataset, in combination with the model Faster R-CNN that was trained on the Open Images dataset.
|