Selekcija v skupinskih modelih z odločitvenimi drevesi

OBLAK, DARJAN

Repository of the University of Ljubljana

Details

Selekcija v skupinskih modelih z odločitvenimi drevesi
ID OBLAK, DARJAN (Author), ID Demšar, Janez (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (1,02 MB)
MD5: 78A9B0A6BDB49528FD52822C58EBD95F
PID: 20.500.12556/rul/4239ff46-861e-4ef8-bc63-5cf8f62ff07f

Abstract

Različne vrste skupinskih modelov se odlikujejo kot ene izmed uspešnejših metod strojnega učenja. Zaradi lepe lastnosti, ki jo imajo, da se točnost ob povečevanju števila notranjih modelov približuje asimptotični zgornji meji, imajo tudi slabost — velikost. V literaturi je moč zaslediti različne pristope, ki iščejo kompromis med velikostjo in točnostjo s postopkom selekcije. To pomeni, da v končni model uvrstijo le nekatere izmed generiranih notranjih modelov. Izkaže se, da na ta način ni možno le zmanjšati skupinskih modelov, temveč tudi povečati točnost. V tem delu metodam s selekcijo dodamo dva nova pristopa, ki za selekcijo uporabljata rob, ki ga modeli določajo na t. i. out-of-bag množici. Slednje je ključno pri majhnih podatkovnih množicah, saj to omogoča selekcijo brez izgube točnosti zaradi manjše učne množice. Metode ovrednotimo na 34 podatkovnih množicah za bagging, naključne gozdove in ekstremno naključne gozdove. Pri tem ugotovimo, da se v nekaterih primerih metode s selekcijo obnesejo statistično značilno bolje kot metode osnovnega skupinskega modela. V ostalih primerih metode s selekcijo uspešno zmanjšajo skupinski model in pri tem v povprečju ohranjajo točnost.

Language:	Slovenian
Keywords:	skupinski modeli, odločitvena drevesa, selekcija v skupinskih modelih, rezanje skupinskih modelov, tanjšanje skupinskih modelov, bagging, naključni gozdovi, ekstremno naključni gozdovi
Work type:	Undergraduate thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2016
PID:	20.500.12556/RUL-85517
Publication date in RUL:	15.09.2016
Views:	1849
Downloads:	516
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Decision Tree Ensemble Selection
Ensemble models are well-known in machine learning for their accuracy. Their main quality, convergence towards an asymptotic upper limit as the number of internal models increases, is however partly counterbalanced by their large size. Existing studies show that posterior reduction of the number of models in the ensemble can be done without hurting — or with even increasing — the accuracy of the ensemble. The thesis introduces two new approaches to ensemble selection using the so-called out-of-bag set. Using such a selection set is important in case of small training sets where no data should be held out for learning in order to maintain high generalization accuracy of an ensemble. Both methods are evaluated on 34 datasets for bagging, random forest and extra decision trees. Some of the comparisons show that the selection model outperforms the base ensemble method in a statistically significant manner. The other confirm that the methods are able to reduce the size of ensembles while on average maintaining accuracy.
Keywords:	ensemble models, decision trees, ensemble selection, ensemble pruning, ensemble thinning, bagging, random forest, extremely randomized trees

Similar works from RUL:
Similar works from other Slovenian collections:

Details

Secondary language

Similar documents