Adaptations of perturbation-based explanation methods for text classification with neural networks

Klemen, Matej

Adaptations of perturbation-based explanation methods for text classification with neural networks
ID Klemen, Matej (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (1,15 MB)
MD5: DD5032FC68C25B3383665C2CC7FA9558

Abstract

Deep neural networks are successfully used for text classification tasks. However, as their functioning is opaque to users, they may learn spurious patterns so we need mechanisms to explain their predictions. Current machine learning explanation methods are designed for general prediction and commonly assume tabular data. They mostly work by perturbing the inputs and assigning credit to the features that strongly impact the outputs. In our work, we propose modified versions of two popular explanation methods (Interactions-based Method for Explanation - IME, and Local Interpretable Model-agnostic Explanation - LIME) for explaining text classifiers. The methods generate input perturbations considering the input dependence. For that purpose, they use language models as generators of more natural perturbations. We first perform a distribution detection experiment, through which we empirically show that the generated perturbations are more natural than the perturbations used in the original IME and LIME. Then, we evaluate the quality of the computed explanations using automated metrics and compare them to the explanations calculated with the original methods. We find that their quality is generally worse, which we attribute to the generation strategy and metrics that measure a different type of importance. As a second contribution, we propose the calculation of IME and LIME explanations in terms of units, longer than words, by using the dependency structure to guide the process of grouping words. The custom explanations mostly decrease the redundancy in the explanations and can serve as a diagnostic tool for the behaviour of models.

Language:	English
Keywords:	perturbation-based explanation methods, dependency-based explanations, text generation, IME explanation, LIME explanation
Work type:	Master's thesis/paper
Typology:	2.09 - Master's Thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2021
PID:	20.500.12556/RUL-130145
COBISS.SI-ID:	77140739
Publication date in RUL:	10.09.2021
Views:	845
Downloads:	118
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	Slovenian
Title:	Prilagoditve perturbacijskih razlagalnih metod za klasifikacijo besedil z nevronskimi mrežami
Globoke nevronske mreže lahko uspešno klasificirajo besedila. Njihovo delovanje ni transparentno, kar lahko privede do tega, da se naučijo lažnih vzorcev. Zato potrebujemo metode za razlago njihovih napovedi. Trenutne razlagalne metode so splošnonamenske in pogosto predpostavljajo tabelarično strukturo podatkov. Razlage pogosto izračunajo tako, da spreminjajo vhodne atribute in pomembnost pripišejo tistim atributom, katerih spremembe močno vplivajo na izhodne napovedi modela. V delu za razlago besedilnih klasifikacijskih modelov predstavimo prilagojene različice metod IME in LIME, ki upoštevajo odvisnosti med vhodnimi atributi. Odvisnosti upoštevajo z uporabo jezikovnih modelov, s katerimi generirajo naravnejše perturbacije vhodnih besedil. Najprej empirično pokažemo, da so generirane perturbacije naravnejše od perturbacij, uporabljenih v originalnih metodah IME in LIME. Nato s pomočjo avtomatskih metrik preverimo kvaliteto razlag, ustvarjenih na podlagi naravnejših perturbacij. Ugotovimo, da so razlage, ustvarjene s prilagojenimi metodami, večinoma slabše od razlag, ustvarjenih z originalnima metodama IME in LIME. Kot glavna razloga navedemo uporabljeno strategijo generiranja perturbacij ter uporabljene metrike, ki merijo drugačno vrsto pomembnosti. V delu predstavimo tudi način za računanje razlag na podlagi enot, daljših od posameznih besed, ki temelji na upoštevanju skladenjske strukture v besedilu. Preverimo kvaliteto prilagojenih razlag in ugotovimo, da so predvsem manj redundantne od razlag na podlagi besed. Pokažemo tudi, da predstavljeni način lahko pomaga diagnosticirati nepravilne napovedi modelov.
Keywords:	perturbacijske razlagalne metode, razlage na podlagi jezikovnih odvisnosti, generiranje besedil, razlage IME, razlage LIME

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents