ClustanGraphics now includes a unique k-means cluster model tree which shows the structure and relationships of an optimized k-means solution. So far as we know, nothing comparable is available in any other clustering software; and as you would expect from Clustan, it's very fast. Imagine that you have completed a k-means analysis on your data,
and you obtained a cluster model solution with or without deletion of outliers. You would like to see how the cases have been clustered In the following example, we obtained a k-means solution for our Mammals Milk case study at 5 clusters, optimizing the Euclidean Sum of Squares with outlier deletion. Two outliers were deleted (Elephant and Rabbit), and a k-means cluster model tree was obtained (left). This shows the 5 clusters highlighted in green, and how they combine by Increase in Sum of Squares (Ward's Method) hierarchically down to one cluster. We also obtain a mini-tree for each cluster, showing how the cases combine to form the cluster, again by Ward's Method. The order of the cases has been serialized, so that similar cases and clusters are adjacent, and the two outliers are added as singletons at the base of the tree. For a small data set, this is an excellent way of representing the results of a k-means cluster model. Sadly, it's not available for k-means solutions produced by other software, because the computations are an integral part of our unique k-means algorithm. So if you're locked into another clustering product you'll just have to soldier on checking and listing cases. What's more, you can use any proximity coefficient and hierarchical clustering method provided in ClustanGraphics, as illustrated in the options window shown below. So, for example, it is possible to cluster the main body of your data into a tessellation of spherical clusters with outliers deleted, and then seach for natural contours using single linkage. This approach is discussed in more detail here, and a paper on this subject at the International Statistical Institute can be downloaded here. But what if I have thousands of cases, you ask? A large tree can be very unwieldy. For large applications we offer the option to output the tree for the final clusters, so that the sub-trees that show their membership are not plotted.
The k-means tree (right)
was produced for 40,000 cases by 5 variables, with Euclidean Sum of Squares
optimized at 80 clusters. ClustanGraphics converged in 76 seconds after 66 iterations. Recall that other programs usually cannot converge with large datasets, because they don't optimize the criterion function We then obtained the corresponding k-means cluster model tree which shows how the 80 final clusters combine hierarchically down to 1. In our diagram, the 7-cluster super-partition is shaded green. If the labels appear difficult to read, this is because we halved its size to conserve your download time and display area. The first integer in each label is the cluster's exemplar, and it may be of interest to note that the output tree can optionally be truncated to either the cluster means or cluster exemplars. The value in parentheses gives the cluster's size, and of course these default labels can be easily changed. To find out more and obtain a k-means tree with your data, |