In my thesis I combined multiple duplicate person recognition techniques like rule-based methods to determine similar persons, pair classification, cluster building and machine learning. Extra comparison of profile pictures was used for recognizing person duplicates, because this comparison is the one of the first things that humans use when comparing profiles. I also used techniques for entity resolution on multiple databases. In the end I measured time complexity and success of the deduplication.
|