Ocenjevanje tveganja za razvoj ideologije incelov na podlagi objav na spletnih forumih z uporabo procesiranja  naravnega jezika : magistrsko delo

Guna, Lučka

Ocenjevanje tveganja za razvoj ideologije incelov na podlagi objav na spletnih forumih z uporabo procesiranja naravnega jezika : magistrsko delo
ID Guna, Lučka (Author), ID Komidar, Luka (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (1,59 MB)
MD5: 2E2032D898976C3037E3C8F045052E13

Abstract

Izraz incel označuje osebo, v veliki večini primerov moškega spola, ki ocenjuje, da je v neprostovoljnem celibatu, in pogosto izraža sovražnost do spolno aktivnih posameznikov. V preteklih letih so se prepričanja incelov s forumov, namenjenih izključno njim (npr. Incels.is), razširila na popularna družabna omrežja (npr. Reddit), zato sem želela med objavami s foruma, ki primarno ni namenjen incelom, identificirati objave, podobne tistim, ki so značilne za incele ter jih razločiti od objav, ki vsebujejo sovražni, toksični ali žaljivi govor in nevtralni govor. To sem storila s pomočjo procesiranja naravnega jezika in globoke nevronske mreže. Napovedni model oz. nevronsko mrežo, ki je vsebovala jezikovni model ELECTRA, sem učila na zapisih s foruma Incels.is in zapisih iz prosto dostopnih podatkovnih setov, ki so vsebovali sovražni, toksični ali žaljivi in nevtralni govor. S pomočjo modela sem napovedovala kategorije oz. vrsto govora v zapisih s foruma Reddit. Z namenom dodatnega preverjanja veljavnosti modela sem na naključnih podvzorcih zapisov trenirala osem modelov z enakimi nastavitvami kot pri prvotnemu ter med seboj primerjala njihove napovedi vrste govora zapisov s foruma Reddit. V obeh korakih sem izvedla analizo smiselnosti napovedi in vsebinsko analizo klasificiranih objav. Model, treniran na celotnem podatkovnem setu, je pravilno klasificiral 64 % objav, med njimi je največji delež pravilnih klasifikacij spadal v kategorijo govora incelov. Parcialni modeli so bili nekoliko manj točni, njihove napovedi pa so bile srednje skladne. Vsebinska analiza je pokazala, da je večina klasifikacij govora incelov in sovražnega, toksičnega ali žaljivega govora nesmiselnih, večina klasifikacij nevtralnega govora pa smiselnih. Smiselno klasificirane objave v kategoriji govora incelov so vsebovale negativna in stereotipna stališča do žensk, tematike, značilne za govor incelov ter izraze težav v duševnem zdravju pri avtorjih objav. Vse omenjene tematike so izjemno pogoste tudi na forumih, namenjenih incelom, kar kaže na potencialno ranljivost uporabnikov foruma Reddit za razvoj ideologije incelov. Na podlagi ugotovitev svoje raziskave sem podala tudi konkretne predloge preventivnih dejavnosti, s katerimi bi lahko pri moških zmanjšali možnost za razvoj ideologije incelov.

Language:	Slovenian
Keywords:	inceli, procesiranje naravnega jezika, strojno učenje, umetne nevronske mreže, spletni forumi
Work type:	Master's thesis/paper
Typology:	2.09 - Master's Thesis
Organization:	FF - Faculty of Arts
Place of publishing:	Ljubljana
Publisher:	[L. Guna]
Year:	2022
Number of pages:	70 str.
PID:	20.500.12556/RUL-141791
UDC:	159.9:316.77(043.2)
COBISS.SI-ID:	131783939
Publication date in RUL:	08.10.2022
Views:	1279
Downloads:	192
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Incel radicalization risk assessment using natural language processing of online forum posts
The term incel describes people, mostly men, who are involuntarily celibate, and are often hateful towards sexually active individuals. In recent years their beliefs have migrated from forums, exclusive to incels (e.g., Incels.is), towards popular social media (e.g., Reddit). Therefore, I used natural language processing to analyze posts from a forum, not exclusive for incels, and identify posts that are similar to those of the incels, and distinguish them from hate, toxic or offensive speech and neutral speech. I developed a predictive model by using an artificial neural network that included ELECTRA language model and trained it on posts from the forum Incels.is and publicly available posts which contained hate, toxic or offensive speech and neutral speech. I used the model to predict speech type categories of posts from Reddit. To additionally assess the validity of the model I established eight partial models with identical settings as the first model and trained them on random subsamples of posts and compared their predictions on Reddit posts. In both steps I evaluated the reasonableness of the predictions and conducted content analysis of the classified posts. The model that was trained on the entire dataset correctly classified 64% of the posts. Majority of correct classifications belonged to the incel speech category. Partial models were less accurate and the agreement between their classifications was moderate. Content analysis of the posts has shown that the classifications of incel, hate, toxic or offensive speech were mostly unreasonable, while the classifications of neutral speech were mostly reasonable. Reasonably classified posts of incel speech contained negative attitudes towards women, themes that are typical for incel speech, and expressions of authors’ mental health issues. These elements are also commonly present on incel forums, which implies that those Reddit users whose posts contain these elements are vulnerable for the development of incel ideology. The findings of my study were used to provide ideas for prevention strategies that could be implemented to prevent the development of incel ideology in men.
Keywords:	incels, natural language processing, artificial neural networks, online forums, machine learning

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents