Učenje regresijskih modelov iz asimetričnih porazdelitev : magistrsko delo

Vodišek, Matija

Učenje regresijskih modelov iz asimetričnih porazdelitev : magistrsko delo
ID Vodišek, Matija (Author), ID Todorovski, Ljupčo (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (6,69 MB)
MD5: 7AAC300B4569E00AAF1B65A8E79AAC1C

Abstract

V literaturi strojnega učenja je problem asimetrično porazdeljene ciljne spremenljivke dobro raziskan za probleme razvrščanja. V njih majhen delež učnih primerov pripada enemu od redko opazovanih razredov. S podobnim, a bolj kompleksnim problemom se lahko soočimo tudi v regresijskih nalogah, kjer ima majhno število učnih primerov izstopajoče, tj. ekstremne vrednosti numerične ciljne spremenljivke. Metoda ponovnega vzorčenja SMOTER je eden izmed redkih pristopov, ki obravnava problem asimetrično porazdeljene ciljne spremenljivke za regresijske naloge. Podobno kot metode pri razvrščanju SMOTER uporablja pristransko vzorčenje učnih primerov, da dobimo večji delež redkih ekstremnih vrednosti ciljne spremenljivke v vzorcu. Osnovni namen magistrskega dela je podati pregled pristopov ponovnega vzorčenja za problem asimetrično porazdeljene ciljne spremenljivke, predstaviti SMOTER ter predlagati dve novi metodi, kjer je ena izmed njiju modifikacija, druga pa poenostavitev metode SMOTER. Poleg tega delo empirično vrednoti obstoječo ter predlagane metode v kombinaciji z učnim algoritmom metode naključnega gozda na izbranih podatkovnih množicah z asimetrično porazdeljeno ciljno spremenljivko. Rezultati vrednotenja pokažejo, da so vse predlagane metode signifikantno učinkovitejše pri napovedovanju kot alternativni pristop preproste uporabe učnega algoritma na prvotnih podatkovnih množicah. Ko med seboj primerjamo predlagane metode, sta modifikacija in poenostavitev metode SMOTER učinkovitejši pri napovedovanju kot SMOTE za regresijo, a ne signifikantno.

Language:	Slovenian
Keywords:	regresija, pristransko ponovno vzorčenje, strojno učenje, primerjava algoritmov
Work type:	Master's thesis/paper
Typology:	2.09 - Master's Thesis
Organization:	FMF - Faculty of Mathematics and Physics
Year:	2018
PID:	20.500.12556/RUL-104461
UDC:	519.2
COBISS.SI-ID:	18461017
Publication date in RUL:	07.10.2018
Views:	1883
Downloads:	356
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Training of regression models from asymmetrical distributions
In machine learning literature, the problem of asymmetrically distributed target variable is well researched for classification tasks. There, only a very small fraction of training examples belong to one of the rarely observed classes. We can also deal with similar, but more complex problem in regression tasks, where small amount of the training cases has outstanding, i.e., extreme values of the numerical target variable. Resampling method SMOTER is one of the few approaches addressing the problem of asymmetrically distributed target variable for regression tasks. Similar to the classification methods, SMOTER employs biased resampling of training examples, so we can get higher share of rare extreme values of target variable in the sample. The main purpose of the master thesis is to provide an overview of resampling approaches to the problem of asymmetrically distributed target variable, present SMOTER and propose two new methods, where one of them is a modification, while the other is a simplification of SMOTER. The thesis also reports upon empirical evaluation of the existing and proposed methods in combination with the learning algorithm of random forest on selected regression data sets with asymmetrically distributed target variable. The evaluation results show that all of the proposed methods have significantly better predictive performance when compared to the alternative of simply applying the learning algorithm to the original data sets. Comparing proposed methods with each other, modification and simplification of SMOTER have better predictive performance than SMOTE for regression, but not significantly.
Keywords:	regression, biased resampling, machine learning, comparison of algorithms

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents