Abstract
Clustering is one of the most important unsupervised learning problems and it consists of finding a common
structure in a collection of unlabeled data. However, due to the ill-posed nature of the problem, different
runs of the same clustering algorithm applied to the same data-set usually produce different
solutions. In this scenario choosing a single solution is quite arbitrary. On the other hand, in many applications
the problem of multiple solutions becomes intractable, hence it is often more desirable to provide
a limited group of ''good'' clusterings rather than a single solution. In the present paper we propose the
least squares consensus clustering. This technique allows to extrapolate a small number of different clustering
solutions from an initial (large) ensemble obtained by applying any clustering algorithm to a given
data-set. We also define a measure of quality and present a graphical visualization of each consensus
clustering to make immediately interpretable the strength of the consensus. We have carried out several
numerical experiments both on synthetic and real data-sets to illustrate the proposed methodology.
Anno
2011
Autori IAC
Tipo pubblicazione
Altri Autori
Murino L., Angelini C., De Feis I., Raiconi G., Tagliaferri R.
Editore
North-Holland
Rivista
Pattern recognition letters