Data clustering and data clustering methods are well researched topics nowadays,
but there is always room for improvement. One way to improve data
clustering methods is to implement them with knowledge from any domain
expert. One way to extract knowledge from a given expert is in the form of
positive must-link and negative cannot-link pairwise constraints. This type
of constraints improves the quality of the detected groups. In real-world
applications, extracting knowledge in the form of positive and negative constraints
is a challenging and time-consuming task for any expert.
In this thesis we address the problem of extracting relevant domain knowledge
from any expert and develop a method called Argument-based Hierarchi-
cal Clustering (ABHC), which is based on hierarchical clustering and built on
the argument-based machine learning paradigm (ABML). The method automatically
selects cases that are considered problematic and presents them to
the expert. In other words, these problematic cases are cases that are likely
to have been clustered into the wrong cluster. The expert then articulates
its domain knowledge in the form of arguments and constraints as to why
the problematic case should or should not be in the cluster it was clustered
into. While the method uses counter examples to expose any shortcomings
or inconsistencies in the expert's arguments. The counter examples allow
the expert to improve his arguments and as a result we get more e_cient
constraints and these are the key to improve the clustering results and not
only that, the constraints obtained in this way are more consistent with the
knowledge of the expert.
We have also developed an interactive application using the aforementioned
method to test the e_ectiveness of our approach. The method was
tested on three experimental domains using domain expert knowledge. We
compared the results with two other algorithms. One is a hierarchical clustering
with constraints called Constrained Agglomerative (CA) and the other
called Argument-based k-means (AB k-means), which is also based on argumentbased
machine learning but uses the k-means algorithm as a clustering method.
The results look promising.
|