Učenje globokih nevronskih mrež za problem stereo vida

ŽBONTAR, JURE

Repository of the University of Ljubljana

Details

Učenje globokih nevronskih mrež za problem stereo vida
ID ŽBONTAR, JURE (Author), ID LeCun, Yann (Mentor) More about this mentor... This link opens in a new window

, ID Demšar, Janez (Comentor)

PDF - Presentation file, Download (12,73 MB)
MD5: 947F09F9BDAE2F384620CF0F639B086F
PID: 20.500.12556/rul/8a4ee846-bde4-4bf5-865f-cef014e9a6d2

Abstract

V pričujoči doktorski disertaciji predstavimo metodo za izračun cene ujemanja za problem stereo vida. Stereo podatkovne množice, na primer KITTI in Middlebury, so v zadnjih nekaj letih postale dovolj velike, da se lahko problema lotimo z metodami, ki temeljijo na učenju. Naš pristop temelji na uporabi globoke konvolucijske nevronske mreže in algoritma za nadzorovano strojno učenje. Učno množico zgradimo iz javno dostopnih stereo podatkovnih množic. Učni primer sestoji iz para slikovnih zaplat in pripada enemu izmed dveh razredov: pozitivnemu, ko sta slikovni zaplati v korespondenci in negativnemu, ko nista. Predstavljeni sta dve arhitekturi konvolucijskih nevronskih mrež za učenje podobnosti. Prva arhitektura je hitrejša od druge, vendar je izračunana globinska slika v povprečju manj natančna. V obeh primerih je vhod v nevronsko mrežo par slikovnih zaplat, izhod pa mera podobnosti med njima. Obe arhitekturi vsebujeta konvolucijski nevronski mreži, ki slikovni zaplati predstavita z vektorjem značilk. Podobnost med slikovnima zaplatama je izračunana na vektorju značilk, namesto na svetlostih posameznih slikovnih elementov. Prva arhitektura vektorja značilk primerja s kosinusno podobnostjo, medtem ko druga arhitektura vektorja primerja z naučeno večnivojsko nevronsko mrežo. Razvito metodo primerjamo z uveljavljenimi metodami na treh podatkovnih množicah -- KITTI 2012, KITTI 2015 in Middlebury -- in ugotovimo, da je naša metoda najnatančnejša na vse treh podatkovnih množicah.

Language:	English
Keywords:	stereo, cena ujemanja, učenje podobnosti, nadzorovano učenje, konvolucijska nevronska mreža
Work type:	Dissertation
Organization:	FRI - Faculty of Computer and Information Science
Year:	2016
PID:	20.500.12556/RUL-84276
COBISS.SI-ID:	1537065923
Publication date in RUL:	27.07.2016
Views:	3532
Downloads:	1093
Metadata:
:	ŽBONTAR, JURE, 2016, Učenje globokih nevronskih mrež za problem stereo vida [online]. Doctoral dissertation. [Accessed 2 April 2025]. Retrieved from: https://repozitorij.uni-lj.si/IzpisGradiva.php?lang=eng&id=84276 Copy citation
Share:

Secondary language

Abstract:
Language:	Slovenian
Title:	Training deep neural networks for stereo vision
We present a method for extracting depth information from a rectified image pair. Our approach focuses on the first stage of many stereo algorithms: the matching cost computation. We approach the problem by learning a similarity measure on small image patches using a convolutional neural network. Training is carried out in a supervised manner by constructing a binary classification data set with examples of similar and dissimilar pairs of patches. We examine two network architectures for learning a similarity measure on image patches. The first architecture is faster than the second, but produces disparity maps that are slightly less accurate. In both cases, the input to the network is a pair of small image patches and the output is a measure of similarity between them. Both architectures contain a trainable feature extractor that represents each image patch with a feature vector. The similarity between patches is measured on the feature vectors instead of the raw image intensity values. The fast architecture uses a fixed similarity measure to compare the two feature vectors, while the accurate architecture attempts to learn a good similarity measure on feature vectors. The output of the convolutional neural network is used to initialize the stereo matching cost. A series of post-processing steps follow: cross-based cost aggregation, semiglobal matching, a left-right consistency check, subpixel enhancement, a median filter, and a bilateral filter. We evaluate our method on the KITTI 2012, KITTI 2015, and Middlebury stereo data sets and show that it outperforms other approaches on all three data sets.
Keywords:	stereo, matching cost, similarity learning, supervised learning, convolutional neural networks

Details

Secondary language

Similar documents