Učenje iz besedilnih podatkovnih tokov za zaznavanje neželene elektronske pošte

PORENTA, JERNEJ

Repository of the University of Ljubljana

Details

Učenje iz besedilnih podatkovnih tokov za zaznavanje neželene elektronske pošte
ID PORENTA, JERNEJ (Author), ID Bosnić, Zoran (Mentor) More about this mentor... This link opens in a new window

, ID Ciglarič, Mojca (Comentor)

PDF - Presentation file, Download (1,76 MB)
MD5: 714851682C016E1FBC73578D097ADCE0
PID: 20.500.12556/rul/685dfed1-c9f4-4c7c-a0fe-0342304796c4

Abstract

V magistrski nalogi je predstavljena metoda uvrščanja sporočil v kategoriji neželenih oziroma želenih elektronskih sporočil s prevedbo problema v inkrementalno učenje iz časovnih vrst. Razširjeni sistemi za uvrščanje neželene elektronske pošte uporabljajo predvsem metode paketnega učenja (naivni Bayesov klasifikator), medtem ko je v magistrski nalogi predstavljeno uvrščanje z uporabo metod analize tokov. Za učenje smo tako izbrali atribute, ki ne vsebujejo osebnih podatkov in za katere ni treba pridobiti privoljenja pošiljatelja oziroma prejemnika (atributi, sestavljeni iz ovojnice elektronske pošte). S pomočjo algoritmov za učenje iz podatkovnih tokov (VFDT, cVFDT) smo zaporedje elektronskih sporočil obravnavali kot besedilni tok podatkov. Rezultate smo primerjali s tradicionalnimi metodami označevanja neželene elektronske pošte in ugotovili, da metode inkrementalnega učenja iz podatkovnih tokov na primeru problemske domene uvrščanja neželene elektronske pošte dosegajo manjšo klasifikacijsko točnost in so zato manj primerne za uporabo.

Language:	Slovenian
Keywords:	elektronska pošta, strojno učenje, analiza podatkovnih tokov
Work type:	Master's thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2016
PID:	20.500.12556/RUL-85521
Publication date in RUL:	15.09.2016
Views:	2591
Downloads:	664
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Learning from textual data streams for detecting email spam
This master thesis introduces a method for the detecting email spam through the translation problem in incremental learning of the time series. Common spam detection systems mainly use methods of supervised learning (naive Bayesian classifier, decision trees), while in the master’s thesis presents the classification by using the methods of data stream mining. For learning sets, we also choose the attributes that do not contain personal data and which are not required to obtain the consent of the sender or the recipient (attributes consist the envelope part of e-mail). With the help of algorithms for learning from data streams (VFDT, cVFDT) we used the electronic sequence of messages as text data stream. The results were compared with the traditional spam detection methods and they show that traditional spam detection methods have higher accuracy compared to algorithms for learning from data stream and therefore are not suitable for detecting email spam.
Keywords:	email, machine learning, stream mining

Similar works from RUL:
Similar works from other Slovenian collections:

Details

Secondary language

Similar documents