Semantična segmentacija scen z zlivanjem meritev LiDAR in barvnih slik

URBAS, MATEJ

Semantična segmentacija scen z zlivanjem meritev LiDAR in barvnih slik
ID URBAS, MATEJ (Author), ID Kristan, Matej (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (4,34 MB)
MD5: 72944C161526705EA0FC2F61CCEE6681

Abstract

V okviru diplomske naloge je predstavljena metoda za semantično segmentacijo voznih scen. Moderne metode semantične segmentacije voznih scen lahko razdelimo na tri kategorije. Prva kategorija za zajem podatkov uporablja samo kamere, druga samo senzorje LiDAR, tretja pa združi podatke obeh senzorjev. V delu se osredotočamo na združevanje meritev LiDAR in barvnih slik s pomočjo mehanizma medpozornosti. Razvijemo metodo SWINCrossFusion, ki temelji na arhitekturi transformerja SWIN, za združevanje meritev pa predstavimo nov transformerski blok SWIN za izvajanje medpozornosti. Metoda izračuna poizvedbe nad podatki iz enega, ključe in vrednosti pa na podatkih drugega senzorja. Tako dobimo učinkovito in hitro združevanje lastnosti obeh senzorjev. Metodo evalviramo na podatkovni zbirki SemanticKITTI in primerjamo z referenčno metodo PMF. Razvita metoda je s 54 % mIoU za dva odstotka slabša od referenčne metode, vendar vhodne podatke procesira 40 % hitreje in porabi 1 GB grafičnega pomnilnika manj.

Language:	Slovenian
Keywords:	transformer, pozornost, medpozornost, segmentacija, LiDAR, slike
Work type:	Bachelor thesis/paper
Typology:	2.11 - Undergraduate Thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2023
PID:	20.500.12556/RUL-144044
COBISS.SI-ID:	139854851
Publication date in RUL:	27.01.2023
Views:	1106
Downloads:	324
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Semantic scene segmentation with LIDAR and RGB image fusion
This diploma thesis presents a method for semantic segmentation of driving scenes. Modern methods for semantic segmentation of driving scenes can be divided into three categories. The first category uses only cameras, the second uses only LiDAR sensors, and the third combines data from both sensors to capture data. In this paper, we focus on the fusion of LiDAR and RGB image data using cross-attention mechanism. We develop SWINCrossFusion, a method based on the SWIN transformer architecture, and introduce a new SWIN transformer block for sensor fusion using cross-attention. The method computes queries over data from one sensor, and keys and values over data from the other sensor. This results in an efficient and fast merging of the measurements of the two sensors. We evaluate the method on the SemanticKITTI dataset and compare it with the reference PMF method. The developed method is with 54 % mIoU two percent worse than the reference method, but processes the input data 40 % faster and consumes 1 GB less graphic memory.
Keywords:	transformer, attention, cross-attention, segmentation, LiDAR, images

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents