Obramba pred sovražnimi napadi na perturbacijske razlage modelov strojnega učenja

VREŠ, DOMEN

Obramba pred sovražnimi napadi na perturbacijske razlage modelov strojnega učenja
ID VREŠ, DOMEN (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (2,24 MB)
MD5: 7A74FB5145021452D8A31D12165120A1

Abstract

Modeli strojnega učenja se uporabljajo na različnih področjih. Poleg točnosti napovednih modelov je pomembna tudi njihova razumljivost, ki omogoča zaupanje vanje. S pomočjo razumevanja napovednega modela ugotavljamo njegovo pristranskost ter vzroke za napake. Kompleksni modeli, kot so naključni gozdovi, nevronske mreže in metoda podpornih vektorjev, niso enostavno razumljivi in delujejo kot črna škatla, zato za njihovo razlago uporabljamo post-hoc razlagalne metode, ki so neodvisne od modela in za razlago posameznega primera uporabljajo perturbacijsko vzorčenje. Robustnost perturbacijskih metod razlage je do zdaj dokaj slabo raziskana. Nedavna raziskava Slacka in sod. je pokazala, da je zaradi slabega perturbacijskega vzorčenja s temi metodami mogoče manipulirati tako, da ne pokažejo pristranskosti klasifikatorja. V diplomskem delu predlagamo uporabo boljšega vzorčenja, ki prepreči možnost takšnih manipulacij z razlago modelov strojnega učenja. Namesto običajnega perturbacijskega vzorčenja predlagamo vzorčenje, ki deluje s pomočjo modernih generatorjev podatkov, ki bolje zajamejo distribucijo učne množice. V eskperimentu pokažemo, da z izboljšanim vzorčenjem povečamo robustnost razlagalnih metod LIME in SHAP ter pohitrimo konvergenco razlagalne metode IME.

Language:	Slovenian
Keywords:	razložljiva umetna inteligenca, sovražni napadi, generatorji podatkov, perturbacije, LIME, SHAP, IME, MCD--VAE, RBF kot generator, naključni gozd kot generator
Work type:	Bachelor thesis/paper
Typology:	2.11 - Undergraduate Thesis
Organization:	FRI - Faculty of Computer and Information Science FMF - Faculty of Mathematics and Physics
Year:	2020
PID:	20.500.12556/RUL-119408
COBISS.SI-ID:	28861955
Publication date in RUL:	08.09.2020
Views:	1823
Downloads:	228
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Protection against adversarial attacks on perturbation based explanations of machine learning models
Machine learning models are used in various areas. In addition to the accuracy of predictive models, their comprehensibility is also important. Understanding the machine learning model provides confidence in it. By understanding the predictive model, we can determine its bias and causes of errors. Complex models such as random forests, neural networks and support vector machines are not easy to understand and act as black box models; therefore, for their explanations we use post-hoc explanation methods that are model-independent and use perturbation sampling to explain each instance. The robustness of perturbation explanation methods has so far been poorly researched. Recent research has shown that due to poor perturbation sampling, these methods can be manipulated so that they do not recognize a biased classifier. In this work, we propose the use of better sampling, which prevents such manipulations. The proposed sampling uses data generators that better capture the training set distribution. We show that improved sampling increases the robustness of the LIME and SHAP explanation methods and speeds up the convergence of the IME explanation method.
Keywords:	explainable AI, adversarial attacks, data generators, perturbations, LIME, SHAP, IME, MCD--VAE, RBF as data generator, random forest as data generator

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents