Beyond classical consensus clustering: the Least Squares approach to multiple solutions

Abstract

Clustering is one of the most important unsupervised learning problems and it consists of finding a common structure in a collection of unlabeled data. However, due to the ill-posed nature of the problem, different runs of the same clustering algorithm applied to the same data-set usually produce different solutions. In this scenario choosing a single solution is quite arbitrary. On the other hand, in many applications the problem of multiple solutions becomes intractable, hence it is often more desirable to provide a limited group of ''good'' clusterings rather than a single solution. In the present paper we propose the least squares consensus clustering. This technique allows to extrapolate a small number of different clustering solutions from an initial (large) ensemble obtained by applying any clustering algorithm to a given data-set. We also define a measure of quality and present a graphical visualization of each consensus clustering to make immediately interpretable the strength of the consensus. We have carried out several numerical experiments both on synthetic and real data-sets to illustrate the proposed methodology.

Anno

2011

Autori IAC

ITALIA DE FEIS

CLAUDIA ANGELINI

Tipo pubblicazione

Articolo in rivista

Altri Autori

Murino L., Angelini C., De Feis I., Raiconi G., Tagliaferri R.

Editore

North-Holland

Rivista