izpis_h1_title_alt

Obramba pred sovražnimi napadi na perturbacijske razlage modelov strojnega učenja
ID VREŠ, DOMEN (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (2,24 MB)
MD5: 7A74FB5145021452D8A31D12165120A1

Abstract
Modeli strojnega učenja se uporabljajo na različnih področjih. Poleg točnosti napovednih modelov je pomembna tudi njihova razumljivost, ki omogoča zaupanje vanje. S pomočjo razumevanja napovednega modela ugotavljamo njegovo pristranskost ter vzroke za napake. Kompleksni modeli, kot so naključni gozdovi, nevronske mreže in metoda podpornih vektorjev, niso enostavno razumljivi in delujejo kot črna škatla, zato za njihovo razlago uporabljamo post-hoc razlagalne metode, ki so neodvisne od modela in za razlago posameznega primera uporabljajo perturbacijsko vzorčenje. Robustnost perturbacijskih metod razlage je do zdaj dokaj slabo raziskana. Nedavna raziskava Slacka in sod. je pokazala, da je zaradi slabega perturbacijskega vzorčenja s temi metodami mogoče manipulirati tako, da ne pokažejo pristranskosti klasifikatorja. V diplomskem delu predlagamo uporabo boljšega vzorčenja, ki prepreči možnost takšnih manipulacij z razlago modelov strojnega učenja. Namesto običajnega perturbacijskega vzorčenja predlagamo vzorčenje, ki deluje s pomočjo modernih generatorjev podatkov, ki bolje zajamejo distribucijo učne množice. V eskperimentu pokažemo, da z izboljšanim vzorčenjem povečamo robustnost razlagalnih metod LIME in SHAP ter pohitrimo konvergenco razlagalne metode IME.

Language:Slovenian
Keywords:razložljiva umetna inteligenca, sovražni napadi, generatorji podatkov, perturbacije, LIME, SHAP, IME, MCD--VAE, RBF kot generator, naključni gozd kot generator
Work type:Bachelor thesis/paper
Typology:2.11 - Undergraduate Thesis
Organization:FRI - Faculty of Computer and Information Science
FMF - Faculty of Mathematics and Physics
Year:2020
PID:20.500.12556/RUL-119408 This link opens in a new window
COBISS.SI-ID:28861955 This link opens in a new window
Publication date in RUL:08.09.2020
Views:1318
Downloads:189
Metadata:XML RDF-CHPDL DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Protection against adversarial attacks on perturbation based explanations of machine learning models
Abstract:
Machine learning models are used in various areas. In addition to the accuracy of predictive models, their comprehensibility is also important. Understanding the machine learning model provides confidence in it. By understanding the predictive model, we can determine its bias and causes of errors. Complex models such as random forests, neural networks and support vector machines are not easy to understand and act as black box models; therefore, for their explanations we use post-hoc explanation methods that are model-independent and use perturbation sampling to explain each instance. The robustness of perturbation explanation methods has so far been poorly researched. Recent research has shown that due to poor perturbation sampling, these methods can be manipulated so that they do not recognize a biased classifier. In this work, we propose the use of better sampling, which prevents such manipulations. The proposed sampling uses data generators that better capture the training set distribution. We show that improved sampling increases the robustness of the LIME and SHAP explanation methods and speeds up the convergence of the IME explanation method.

Keywords:explainable AI, adversarial attacks, data generators, perturbations, LIME, SHAP, IME, MCD--VAE, RBF as data generator, random forest as data generator

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back