Object detection is a current topic in industry and research. It enables automatic identification of an individual objects in an image, which is often faster and more accurate than that of the human eye. With the rise of deep neural networks, the process of semantic segmentation is particularly interesting, as it allows the extraction of information from an image on pixel level. As part of the BA thesis, we addressed the issue of identifying a person in a video and replacing their background with any given content. We designed a diverse and accurate set of data subjects and their binary masks, implemented and trained two convolutional neural networks for semantic segmentation, Fast-SCNN and UNet. We then compared the two networks and analyzed the results. The Fast-SCNN network was further optimized with ONNX Runtime to enable real-time execution on the CPU. On an appropriately annotated dataset combined with an optimized version of the Fast-SCNN neural network, we achieved an average of 27 FPS in videos and 29 FPS in real-time webcam segmentation.
|