Title: How many groups? Visual tools for assessing cluster structure in data.
Abstract: Cluster analysis is a set of methods designed to explore and find unknown group structure in (multivariate) data. There are a wide variety of methods available, the application of which can result in many different proposed cluster structures. Particularly for more heuristic methods (e.g. hierarchical clustering), it can be difficult to make an objective decision on which method/number of clusters gives the “best” answer. One alternative that claims to address this flaw is the model-based clustering methodology – the application of, often Gaussian, mixture models usually with some likelihood based criterion for selection of the best model/number of mixture components. An issue with this approach can be its tendency to overestimate the number of groups when associating each mixture component with a cluster (estimated group). This talk seeks to present novel applications of an old fashioned clustering tool – the dendrogram – to visually assess either combination of mixture components into clusters or to assess the similarity/grouping of clustering solutions from a variety of methods (applied to the same data). The dendrogram’s tree diagram presentation is a particularly useful graphic as it can be easily understood by laypeople and is a visually appealing tool for exploratory analysis into group structure. It is also a good summary for data of arbitrary dimension.