Večciljno učenje v klasifikaciji in regresiji

Čepin, Gregor

Večciljno učenje v klasifikaciji in regresiji
ID Čepin, Gregor (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (453,51 KB)
MD5: A1333BD493C70C66589CB8C25E1F59AD
PID: 20.500.12556/rul/40d3fe64-d54f-472c-a7fc-6c1d0e23ad9e

Abstract

Večciljno učenje je metoda strojnega učenja, pri kateri je cilj, da se algoritem obenem nauči reševati več sorodnih problemov. Namesto več ločenih modelov poišče algoritem skupen model, ki je tipično manjši od vsote velikosti posameznih modelov, je lažje razumljiv in manj prilagojen učnim podatkom. Pri napovedovanju s skupnim modelom algoritem napoveduje vrednosti za več problemov hkrati. Uporabljeni problemi morajo biti med seboj sorodni, da lahko učenje enega problema pripomore k boljšemu učenju ostalih problemov. Do sedaj je bil pristop pri drevesnih metodah uspešno uporabljan za združevanje več klasifikacijskih ali več regresijskih problemov, v tem delu pa pristop posplošimo tako, da lahko uporabljamo klasifikacijske in regresijske naloge mešano. Pri gradnji drevesnih modelov uporabljata klasifikacija in regresija različne metode za izbiro atributov pri delitvi primerov v notranjih vozliščih. Vrednosti ocen med seboj niso primerljive, zato pri gradnji drevesa atribute rangiramo glede na obe metodi in izberemo atribut, ki je skupno najbolje rangiran. V nalogi implementiramo večciljno odločitveno in regresijsko drevo ter ansambelski metodi večciljni bagging in večciljni naključni gozd. Primerjamo jih z enociljnimi različicami algoritmov, z običajnimi večciljnimi drevesi in z večciljnimi nevronskimi mrežami. Uvedemo mero sorodnosti med nalogami, ki temelji na rangiranju atributov. Ta nam omogoča, da znotraj podatkovne množice poiščemo tiste naloge, ki so si najbolj sorodne in jih je smiselno obravnavati z večciljnim pristopom. Na eni od podatkovnih množic deluje implementirani večciljni naključni gozd statistično značilno bolje kot enociljni algoritmi. Na nekaterih podatkovnih množicah pa implementirani algoritmi delujejo slabše kot enociljni.

Language:	Slovenian
Keywords:	strojno učenje, odločitveno drevo, večciljno učenje, naključni gozd, bagging, klasifikacija, regresija, rangiranje
Work type:	Master's thesis/paper
Organization:	FRI - Faculty of Computer and Information Science
Year:	2016
PID:	20.500.12556/RUL-81190
Publication date in RUL:	31.03.2016
Views:	2378
Downloads:	521
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Multitask learning in classification and regression
Multitask learning is an approach to machine learning, in which algorithm learns to solve multiple related problems. It tries to find one common model instead of building multiple separate models. Such a model is usually smaller than the sum of separate models, easier to understand and less likely to overfit training data. In prediction stage the algorithm predicts values for several problems at the same time. Problems that are learned together must be related, so that learning of one problem can improve learning of other problems. Currently this approach is used with tree models for either multiple classification or multiple regression tasks. In this work we extend the approach to mixed classification and regression tasks. During construction of trees different attribute selection methods are used in regression and classification. The returned scores are not directly comparable, so in our scenario we rank attributes for each task and choose the attribute that is best ranked in total. We implement multitask regression and classification tree, multitask bagging and multitask random forest based on rankings of attributes. We compare these algorithms with their single task variants, with regular multitask tree and with multitask neural network. We propose task relatedness measure based on ranking of attributes. In this way we can find related tasks in a dataset and use them together in multitask approach. On one dataset implemented multitask random forest works statistically significantly better than single-task version. On some datasets implemented algorithms work worse than single-task versions.
Keywords:	machine learning, decision tree, multitask tree, random forest, bagging, classification, regression, ranking

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents