izpis_h1_title_alt

Preprečevanje neželenih komentarjev za spletne novice s pomočjo tehnik za procesiranje naravnega jezika
ID Čebular, Martin (Author), ID Žitnik, Slavko (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (3,43 MB)
MD5: 0F9BC3260CF27E9BFE7F88DBF0F470E7

Abstract
Completely Automated Public Turing test to tell Computers and Humans Apart (v nadaljevanju CAPTCHA) je test, katerega cilj je ločiti človeškega uporabnika od računalnika. Na spletu se test CAPTCHA navadno pojavi ob obrazcu, kot zaščita pred samodejnim izpolnjevanjem in oddajanjem obrazca. Kot najbolj znano obliko testa CAPTCHA omenimo test, v okviru katerega je reševalcu podana slika s popačenim besedilom, reševalčeva naloga pa je razpoznati črke ali besede z dane slike. V magistrskem delu se posvetimo testom oziroma nalogam CAPTCHA v tekstovni obliki. Zasnujemo in implementiramo sistem CAPTCHA, katerega naloge temeljijo na tehnikah obdelave naravnega jezika. Predstavimo dva tipa tovrstnih nalog CAPTCHA: naloge na podlagi prepoznavanja imenskih entitet in naloge na podlagi razreševanja koreferenčnosti. Sistem CAPTCHA zasnujemo razširljivo, kar omogoča enostavno vpeljavo novih tipov nalog vanj. Implementiramo tudi odjemalca CAPTCHA, uporabniški vmesnik, ki ga lahko umestimo v spletni obrazec in reševalcem omogoča reševanje nalog. Uporabo sistema CAPTCHA skupaj z odjemalcem prikažemo na primeru integracije, izdelanem v okviru magistrskega dela. Uporabo demonstriramo tudi z umestitvijo odjemalca CAPTCHA v obrazec za oddajo komentarja na spletnem portalu RTVSLO.si. Implementirani sistem skupaj z odjemalcem omogoča celostno izvedbo postopka verifikacije človeške interakcije uporabnikov. Evalviramo njegovo učinkovitost in skalabilnost, dostopnost odjemalca CAPTCHA slepim in slabovidnim uporabnikom, ter potencialne možnosti za gradnjo novih učnih množic iz zbranih podatkov, ki nastanejo z uporabo sistema.

Language:Slovenian
Keywords:CAPTCHA, dokaz o človeški interakciji, prepoznavanje imenskih entitet, odkrivanje koreferenčnosti
Work type:Master's thesis/paper
Typology:2.09 - Master's Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2021
PID:20.500.12556/RUL-133027 This link opens in a new window
COBISS.SI-ID:87294723 This link opens in a new window
Publication date in RUL:08.11.2021
Views:574
Downloads:84
Metadata:XML RDF-CHPDL DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Preventing unwanted comments to online news articles using natural language processing techniques
Abstract:
Completely Automated Public Turing Test to tell Computers and Humans Apart (CAPTCHA) is a test that aims to tell a human user apart from a computer. On the web, CAPTCHA usually appears next to a form, as a protection against automated form filling and submission. The most well-known form of CAPTCHA is a test in which a solver is given an image with distorted text and the solver's task is to identify the characters or words in the image. In this thesis, we focus on CAPTCHAs in purely textual form. We design and implement a CAPTCHA system whose tasks are based on natural language processing techniques. We present two types of such CAPTCHA tasks: tasks based on named entity recognition and tasks based on coreference resolution. We design the CAPTCHA system to be extensible, making it easy to introduce new types of tasks into it. We also implement a CAPTCHA client, a user interface that can be embedded in a web form and allows solvers to solve the tasks. We illustrate the use of the CAPTCHA system together with the client in an integration sample developed as a part of the thesis. We also demonstrate its use by placing the CAPTCHA client in the comment form on the RTVSLO.si news portal. The implemented system together with the client allows for an end-to-end execution of the human user interaction verification process. We evaluate its efficiency and scalability, the accessibility of the CAPTCHA client to blind and visually impaired users, and potential possibilities for building new learning datasets from the data, produced by using the system.

Keywords:CAPTCHA, human-interaction proof, named entity recognition, coreference resolution

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back