The task of machine vision today is to explore ways to make computers more intelligent and capable of processing and understanding sensory information. In addition, there was a growing demand in many industries for automated systems that could perform the tasks of object recognition and detection, object tracking, and image analysis.
Several machine vision models have been developed based on various techniques and methods, including deep learning and neural networks. This enables them to have different cognitive abilities in terms of speed, accuracy and computational complexity. The assignment also presents all the topics necessary to understand the background of the operation of these models.
The diploma thesis aims to investigate a technique for machine detection and recognition of objects, from which three algorithms are selected for a more detailed comparison. The selected algorithms are SSD, Faster RCNN and YOLO, based on which models for object detection are built. To perform their task, models must be trained on a prepared training database. In the end a comparison is made according to criteria such as certainty of detection, average accuracy, processing speed, FPS, use of computer resources, and assessment of the complexity of using the models.
A prerequisite for learning and running models is a sufficiently large memory, a powerful processor and a graphic card. The task describes the procedures for preparing own test data sets and creating models. One learning example is performed on the YOLOv8 model and the other using the Teachable Machine. Programs and codes for running the models are written in the Python programming environment, prepared by the open-source Anaconda distribution. The libraries required to run the YOLOv8 model are Ultralytics and PyTorch. The next models are SSD MobileNet FPN and Faster RCNN ResNet50, which requires the TensorFlow library and the openCV library. To obtain an effective comparison, all these models are trained on the high-quality COCO database.
Our comparison showed that the YOLOv8 model is the most effective, because it enables real-time object detection and recognition with good accuracy. At the same time, it does not require a very powerful graphics processor and is user-friendly to implement. Meanwhile, the SSD MobileNet FPN and Faster RCNN ResNet50 models are more demanding to use as programmers. The Faster RCNN model also achieved the highest detection certainty, but its response is slow and consumes the most computing resources. Only the SSD model did not achieve the expected results, due to its poor accuracy and lack of quick response. The use of the right model in the end depends on the user's need to solve a particular problem.
|