Determination of similarity threshold in clustering problems for large data sets

A new automatic method based on an intra-cluster criterion, to obtain a similarity threshold that generates a well-defined clustering (or near to it) for large data sets, is proposed. This method uses the connected component criterion, and it neither calculates nor stores the similarity matrix of the objects in main memory. The proposed method is focussed on unsupervised Logical Combinatorial Pattern Recognition approach. In addition, some experimentations of the new method with large data sets are presented. © Springer-Verlag Berlin Heidelberg 2003.

Artificial intelligence; Computers; Automatic method; Clustering problems; Connected component; Intra-cluster; Large datasets; Main memory; Similarity matrix; Similarity threshold; Pattern recognition

VIVO