In this thesis we describe the system Never Ending Language Learner referred to as NELL that builds a knowledge base in the form of concepts connected by relations, by reading the web. Some relations are dependent on time, which means that their value may be different at two moments in time. We call them temporal relations. These are further divided into relations that happen and relations that start and end or equivalently, relations with one critical moment in time and relations with two critical moments. A critical moment is a moment at which the value of the relation changes. The change may be the beginning, which is the transition from 0 to 1, the ending, which is the transition from 1 to 0, or the event, which changes the value of the relation from 0 to 1 and then quickly back from 1 to 0. Relations with two critical moments in time have a beginning and an end, whereas relations with one such moment only have a happening.
The system NELL has a problem with the recognition of such critical moments for relations, which means that it doesn't know when some relation began or ended, or in the case of relations with one critical moment, happened. The general problem of temporal relations asks how to get metadata for a relation, about when it happened for relations with one critical moment in time, and when it began and ended for the relations with two.
In the thesis we address the specific subproblem of the problem of temporal relations that asks how to find text that contains information about critical moments. We describe the system EventRegistry, which collects news paper articles from various sources and groups them into events, which are represented as data about various significant happenings. Some of these events contain information about critical moments in time.
We propose a general system for detecting events, which contain information about critical moments for relations with two of them. The system is based on classification algorithms, which, by classification, separate the events that contain information about critical moments from the others. Because classification algorithms demand labeled data, and labeling is extremely costly and slow, we improve the proposed system with active learning strategies, which try to reduce the cost of labeling data. We simulate and analyze the proposed system for the case of the relation HasSpouse(x,y) and report the success of it's performance. For this concrete relation it turns out that the problem is very solvable, as we report AUC near 0.90 for the classification.
Because the data is labeled in a way that allows us to also detect the type of critical moment contained in the event in a simple way, we present the results for this subproblem as well, for the concrete relation HasSpouse(x,y). This problem also turns out to be highly solvable by classification, as we also achieve AUC near 0.90.
|