Completely Automated Public Turing Test to tell Computers and Humans Apart (CAPTCHA) is a test that aims to tell a human user apart from a computer. On the web, CAPTCHA usually appears next to a form, as a protection against automated form filling and submission. The most well-known form of CAPTCHA is a test in which a solver is given an image with distorted text and the solver's task is to identify the characters or words in the image.
In this thesis, we focus on CAPTCHAs in purely textual form. We design and implement a CAPTCHA system whose tasks are based on natural language processing techniques. We present two types of such CAPTCHA tasks: tasks based on named entity recognition and tasks based on coreference resolution. We design the CAPTCHA system to be extensible, making it easy to introduce new types of tasks into it. We also implement a CAPTCHA client, a user interface that can be embedded in a web form and allows solvers to solve the tasks.
We illustrate the use of the CAPTCHA system together with the client in an integration sample developed as a part of the thesis. We also demonstrate its use by placing the CAPTCHA client in the comment form on the RTVSLO.si news portal. The implemented system together with the client allows for an end-to-end execution of the human user interaction verification process. We evaluate its efficiency and scalability, the accessibility of the CAPTCHA client to blind and visually impaired users, and potential possibilities for building new learning datasets from the data, produced by using the system.
|