Agglomerative Constrained Clustering Through Similarity and Distance Recalculation

TitleAgglomerative Constrained Clustering Through Similarity and Distance Recalculation
Publication TypeConference Paper
Year of Publication2020
AuthorsGonzález-Almagro, Germán, Suarez Juan Luis, Luengo Julián, Cano J. R., and García Salvador
Conference NameInternational Conference on Hybrid Artificial Intelligence Systems
KeywordsAgglomerative clustering, constrained clustering, Semi-supervised learning, Similarity recalculation

Constrained clustering has become a topic of considerable interest in machine learning, as it has been shown to produce promising results in domains where only partial information about how to solve the problem is available. Constrained clustering can be viewed as a semi-supervised generalization of clustering, which is traditionally unsupervised. It is able to leverage a new type of information encoded by constraints that guide the clustering process. In particular, this study focuses on instance-level must-link and cannot-link constraints. We propose an agglomerative constrained clustering algorithm, which combines distance-based and clustering-engine adapting methods to incorporate constraints into the partitioning process. It computes a similarity measure on the basis of distances (in the dataset) and constraints (in the constraint set) to later apply an agglomerative clustering method, whose clustering engine has been adapted to consider constraints and raw distances. We prove its capability to produce quality results for the constrained clustering problem by comparing its performance to previous proposals on several datasets with incremental levels of constraint-based information.