The thesis discusses the construction and training process of a convolutional neural network aimed at merging data from a black-and-white camera and a low-resolution depth camera. The goal is to create a high-resolution depth map of indoor scenes based on this data. The primary architecture used is an updated version of the UNet architecture, which can accept two inputs after modification. To achieve sufficiently high accuracy of the output depth map while maintaining fast computational time, we tested several architectures while modifying the number of layers.
|