Preprečevanje neželenih komentarjev za spletne novice s pomočjo tehnik za procesiranje naravnega jezika

Čebular, Martin

Preprečevanje neželenih komentarjev za spletne novice s pomočjo tehnik za procesiranje naravnega jezika
ID Čebular, Martin (Author), ID Žitnik, Slavko (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (3,43 MB)
MD5: 0F9BC3260CF27E9BFE7F88DBF0F470E7

Abstract

Completely Automated Public Turing test to tell Computers and Humans Apart (v nadaljevanju CAPTCHA) je test, katerega cilj je ločiti človeškega uporabnika od računalnika. Na spletu se test CAPTCHA navadno pojavi ob obrazcu, kot zaščita pred samodejnim izpolnjevanjem in oddajanjem obrazca. Kot najbolj znano obliko testa CAPTCHA omenimo test, v okviru katerega je reševalcu podana slika s popačenim besedilom, reševalčeva naloga pa je razpoznati črke ali besede z dane slike. V magistrskem delu se posvetimo testom oziroma nalogam CAPTCHA v tekstovni obliki. Zasnujemo in implementiramo sistem CAPTCHA, katerega naloge temeljijo na tehnikah obdelave naravnega jezika. Predstavimo dva tipa tovrstnih nalog CAPTCHA: naloge na podlagi prepoznavanja imenskih entitet in naloge na podlagi razreševanja koreferenčnosti. Sistem CAPTCHA zasnujemo razširljivo, kar omogoča enostavno vpeljavo novih tipov nalog vanj. Implementiramo tudi odjemalca CAPTCHA, uporabniški vmesnik, ki ga lahko umestimo v spletni obrazec in reševalcem omogoča reševanje nalog. Uporabo sistema CAPTCHA skupaj z odjemalcem prikažemo na primeru integracije, izdelanem v okviru magistrskega dela. Uporabo demonstriramo tudi z umestitvijo odjemalca CAPTCHA v obrazec za oddajo komentarja na spletnem portalu RTVSLO.si. Implementirani sistem skupaj z odjemalcem omogoča celostno izvedbo postopka verifikacije človeške interakcije uporabnikov. Evalviramo njegovo učinkovitost in skalabilnost, dostopnost odjemalca CAPTCHA slepim in slabovidnim uporabnikom, ter potencialne možnosti za gradnjo novih učnih množic iz zbranih podatkov, ki nastanejo z uporabo sistema.

Language:	Slovenian
Keywords:	CAPTCHA, dokaz o človeški interakciji, prepoznavanje imenskih entitet, odkrivanje koreferenčnosti
Work type:	Master's thesis/paper
Typology:	2.09 - Master's Thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2021
PID:	20.500.12556/RUL-133027
COBISS.SI-ID:	87294723
Publication date in RUL:	08.11.2021
Views:	916
Downloads:	99
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Preventing unwanted comments to online news articles using natural language processing techniques
Completely Automated Public Turing Test to tell Computers and Humans Apart (CAPTCHA) is a test that aims to tell a human user apart from a computer. On the web, CAPTCHA usually appears next to a form, as a protection against automated form filling and submission. The most well-known form of CAPTCHA is a test in which a solver is given an image with distorted text and the solver's task is to identify the characters or words in the image. In this thesis, we focus on CAPTCHAs in purely textual form. We design and implement a CAPTCHA system whose tasks are based on natural language processing techniques. We present two types of such CAPTCHA tasks: tasks based on named entity recognition and tasks based on coreference resolution. We design the CAPTCHA system to be extensible, making it easy to introduce new types of tasks into it. We also implement a CAPTCHA client, a user interface that can be embedded in a web form and allows solvers to solve the tasks. We illustrate the use of the CAPTCHA system together with the client in an integration sample developed as a part of the thesis. We also demonstrate its use by placing the CAPTCHA client in the comment form on the RTVSLO.si news portal. The implemented system together with the client allows for an end-to-end execution of the human user interaction verification process. We evaluate its efficiency and scalability, the accessibility of the CAPTCHA client to blind and visually impaired users, and potential possibilities for building new learning datasets from the data, produced by using the system.
Keywords:	CAPTCHA, human-interaction proof, named entity recognition, coreference resolution

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents