The need for improvement of data clustering methods demanded more interactive options with domain experts, which led to the development of algorithms, coined as constrained clustering. These algorithms use domain knowledge in the form of positive must-link and negative cannot-link constraints to improve the quality of detected groups. One of the most overlooked issues in this filed is the effectiveness of constraint elicitation. While the process of constraint elicitation can be a tedious task it can have a significant impact on the quality of clustering.
In this thesis we designed and developed a method named Argument-based k-means (AB k-means), which is designed for a more efficient clustering and is based on the paradigm of argument-based machine learning (ABML).
The knowledge refinement loop enables the domain expert to articulate his domain knowledge by argumenting automatically chosen problematic cases, while the method with the help of counter examples highlights any shortcomings in the expert’s arguments. We adapted the knowledge refinement loop to the needs of clustering by exposing badly and well clustered cases when eliciting constraints, which are crucial for the improvement of clustering. At the same time the obtained constraints lead to clusters that are consistent with the knowledge of the expert in their chosen domain.
For an easier use of the new method we have also developed an interactive application. The effectiveness of our approach was empirically tested on three different experimental domains and compared favourably with an ordinary algorithm for constrained clustering.
|