The need to model visual information with compact representations has existed since the early days of computer vision. With the spread of ever more powerful and capable sensors, this need has become more and more present in recent years. As such, neural networks have become the popular choice for quick and effective processing of visual data.
For this thesis we implemented a convolutional neural network with which we can determine or at least approximate all objects in a given point cloud scene. We started off with a simple architecture that could predict the parameters of a single object in a scene. Then we expanded it with an architecture similar to Faster R-CNN, that could predict the parameters for any amount of object in a scene.
The results for the initial neural network were satisfactory. The second, generalized one still gave decent results, but compared to the initial one understandably performed somewhat worse, since it was also necessary to segment all the objects apart, not just predict parameters for each one.
|