The aim of this thesis is to develop a program that can accurately identify and distinguish various objects in photographs. Specifically, the program will analyze image data to determine and label which photos contain postage stamps and which do not, thereby avoiding the need to manually search through a large number of photos. Our application is designed for use in collecting circles, where it will assist in digitizing collections and adding new postage stamps. Additionally, the application will encourage new members to join the collecting community and create their own albums. To create the application, we utilized object outlines, as postage stamps have a characteristic rectangular shape with wavy edges. Before identifying the stamps, we performed several transformations on the photos: standardizing their size, graying them out, and applying a median filter. These transformations significantly simplified the identification process without losing important information. To further simplify the process, we created a condition that the objects must be photographed against a contrasting background. This condition helps to separate the objects from the background, which is the next step in our identification process. Since we know that the object and background are contrasted, we can divide their image elements into two classes by properly setting the threshold. The elements of these classes form regions, and we obtain their outlines by following the rules for tracking these regions. In this way, we can obtain a multitude of different outlines from a single photograph. However, we will focus only on the one that encompasses the largest number of image elements. The program will determine the presence of a postage stamp in the photo by comparing the outline of the object with those in the training set. However, the current format of their recording makes this task difficult. To solve these problems, we transformed the outlines into a computer-friendly format using a series of mathematical transformations. The result is a list of vectors whose magnitudes and directions represent the shape of the selected outline. With their help, comparing the outline with the training set became much easier.
To ensure the program works correctly, the user must follow strict rules that are important for achieving the desired results. However, we want the application to be extremely user-friendly. To this end, we can use more advanced methods of recognition, such as computer vision, which is based on artificial intelligence and uses neural networks. To understand how it works, we learned about ideas such as parametric models, linear classification, loss function, and gradient descent. Learning this software is done by labeling photographs. In this process, we searched each photograph for the desired objects and labeled them accordingly. We then used the photographs to train the program. After comparing the results of the outline recognition with those of computer vision, we found that the latter is very promising, but not yet suitable for our use. With additional training, computer vision could be used in our application, eliminating the annoying conditions of use and further improving the user experience. However, we did not tackle this aspect within the scope of our thesis.
|