Iskanje interakcij s pomočjo interpretacije napovednih modelov na primeru navadne trsne rumenice

Kokalj, Enja

Iskanje interakcij s pomočjo interpretacije napovednih modelov na primeru navadne trsne rumenice
ID Kokalj, Enja (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

URL - Presentation file, Visit http://pefprints.pef.uni-lj.si/6036/ This link opens in a new window

Abstract

Magistrsko delo obravnava problem interpretacije netransparentnih modelov strojnega učenja, določanje vpliva atributov na generirane napovedi in iskanje interakcij med atributi v podatkovnih množicah. Tega se lotimo na primeru bolezni vinske trte, ki povzroči izgubo pridelka in gospodarsko škodo. Identifikacija ustreznih genskih pokazateljev bi prispevala k pravočasnemu odkrivanju okužbe in njeni zajezitvi. Prispevek magistrskega dela je v novem algoritmu za iskanje interakcij med atributi v podatkovnih množicah, ki je sestavljen iz metode za razlaganje napovedi modelov strojnega učenja SHAP in algoritma za iskanje povezovalnih pravil Apriori. Novo predlagani algoritem je na umetno generiranih podatkih uspešno prepoznal vplivne atribute in interakcije med njimi. Vplivni atributi (geni), ki smo jih našli na biološki množici podatkov, v veliki meri sovpadajo z rezultati predhodne raziskave, ki nam je služila kot referenčna točka. Odkrili smo tudi močna povezovalna pravila, ki opisujejo potencialne interakcije med atributi (geni), vendar jih zaradi pomanjkanja bioloških informacij o genskih interakcijah pri obrambnem odzivu vinske trte ne moremo ustrezno ovrednotiti. Prednost našega algoritma je, da povezovalna pravila išče na podlagi vpliva posameznih atributov na napovedi, namesto na dejanskih vrednostih izražanja genov, s čimer se izognemo različnim stopnjam dejanskega izražanja genov in vse gene obravnavamo glede na njihov vpliv v modelu strojnega učenja. Omogoča tudi iskanje interakcij višjih redov in dobljeni rezultati (povezovalna pravila) so interpretabilni. Slabost algoritma je iskanje povezovalnih pravil le na diskretiziranih atributih in je zato odvisnost od izbrane metode diskretizacije.

Language:	Slovenian
Keywords:	interpretacija modelov strojnega učenja
Work type:	Master's thesis/paper
Typology:	2.09 - Master's Thesis
Organization:	PEF - Faculty of Education
Year:	2019
PID:	20.500.12556/RUL-111815
COBISS.SI-ID:	12628041
Publication date in RUL:	21.10.2019
Views:	1480
Downloads:	168
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Identifying attribute interactions by interpreting model predictions: the case of grapevine yellows disease
The master’s thesis deals with the problem of interpreting black box machine learning models, explaining predictions with attributes’ contributions, and identifying attribute interactions in data sets. We address this in the case of grapevine disease, which causes crop losses and economic damage. Identification of useful gene markers would allow for earlier detection and containment of infection. The contribution of our work is in a new algorithm for identifying attribute interactions that consists of a method for interpreting model predictions called SHAP, and an algorithm for association rule mining called Apriori. The proposed algorithm successfully identified important attributes and their interactions on artificially generated data sets. The important attributes (genes) we found in the biological data set confirm the results of a previous study that served as a reference point. We also discovered strong association rules that describe possible attribute (gene) interactions, but were unable to draw any conclusions due to a lack of biological information about gene interactions in defense response of grapevine. The advantage of the algorithm is that it generates association rules based on attribute importance rather than using actual gene expression values, thus avoiding different levels of actual gene expression and only considering the genes’ contribution in the machine learning model. It can be used to find higher order interactions and the results (association rules) are interpretable. Its disadvantage is that the generation of association rules is done on discretized numerical attributes and is therefore dependent on the discretization method.
Keywords:	machine learning model interpretation

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents