On the interpretability of machine learning models and experimental feature selection in case of multicollinear data

Drobnič, Franc; Kos, Andrej; Pustišek, Matevž

On the interpretability of machine learning models and experimental feature selection in case of multicollinear data
ID Drobnič, Franc (Avtor), ID Kos, Andrej (Avtor), ID Pustišek, Matevž (Avtor)

	PDF - Predstavitvena datoteka, prenos (1,52 MB) MD5: E803CFED6E182F33E8A85BAFDF1E72A6
	URL - Izvorni URL, za dostop obiščite https://www.mdpi.com/2079-9292/9/5/761

Izvleček

In the field of machine learning, a considerable amount of research is involved in the interpretability of models and their decisions. The interpretability contradicts the model quality. Random Forests are among the best quality technologies of machine learning, but their operation is of “black box” character. Among the quantifiable approaches to the model interpretation, there are measures of association of predictors and response. In case of the Random Forests, this approach usually consists of calculating the model’s feature importances. Known methods, including the built-in one, are less suitable in settings with strong multicollinearity of features. Therefore, we propose an experimental approach to the feature selection task, a greedy forward feature selection method with least-trees-used criterion. It yields a set of most informative features that can be used in a machine learning (ML) training process with similar prediction quality as the original feature set. We verify the results of the proposed method on two known datasets, one with small feature multicollinearity and another with large feature multicollinearity. The proposed method also allows for a domain expert help with selecting among equally important features, which is known as the human-in-the-loop approach.

Jezik:	Angleški jezik
Ključne besede:	interpretable machine learning, feature multicollinearity, random forests, feature selection, feature importance, greedy feature selection
Vrsta gradiva:	Članek v reviji
Tipologija:	1.01 - Izvirni znanstveni članek
Organizacija:	FE - Fakulteta za elektrotehniko
Status publikacije:	Objavljeno
Različica publikacije:	Objavljena publikacija
Leto izida:	2020
Št. strani:	15 str.
Številčenje:	Vol. 9, iss. 5, art. 761
PID:	20.500.12556/RUL-133715
UDK:	004.8
ISSN pri članku:	2079-9292
DOI:	10.3390/electronics9050761
COBISS.SI-ID:	14438659
Datum objave v RUL:	10.12.2021
Število ogledov:	846
Število prenosov:	211
Metapodatki:
:	Kopiraj citat
Objavi na:

Gradivo je del revije

Naslov:	Electronics
Skrajšan naslov:	Electronics
Založnik:	MDPI
ISSN:	2079-9292
COBISS.SI-ID:	523068953

Licence

Licenca:	CC BY 4.0, Creative Commons Priznanje avtorstva 4.0 Mednarodna

Povezava:	http://creativecommons.org/licenses/by/4.0/deed.sl
Opis:	To je standardna licenca Creative Commons, ki daje uporabnikom največ možnosti za nadaljnjo uporabo dela, pri čemer morajo navesti avtorja.
Začetek licenciranja:	06.05.2020

Sekundarni jezik

Jezik:	Slovenski jezik
Ključne besede:	razložljivo strojno učenje, multikolinearnost značilk, naključni gozdovi, izbira značilk, pomembnost značilk, požrešna izbira značilk

Projekti

Financer:	ARRS - Agencija za raziskovalno dejavnost Republike Slovenije
Številka projekta:	P2-0246
Naslov:	ICT4QoL - Informacijsko komunikacijske tehnologije za kakovostno življenje

Podobna dela

Podobna dela v RUL:
Podobna dela v drugih slovenskih zbirkah:

Nazaj