Optimal Tree Partition 

Home
About Clustan
Cluster Analysis
Applications
ClustanGraphics
User Support
Clustan/PC
Orders
What's New
White Papers
Contact Us
A question which frequently occurs when using hierarchical cluster analysis is - how many clusters is best?  ClustanGraphics offers two tests for the best number of clusters.

In the above example, using the upper tail rule with the Mammals Case Study, Best Cut indicates that the 2 or 3-cluster partitions are significant departures from the distribution of fusion values.  When you click OK, the tree will be shaded for the largest number of clusters; in this case, the 3-cluster section.

Upper Tail Rule
The upper tail rule takes the fusion values as a series, computes the mean and standard deviation, and a t-statistic as the standardised deviation from the mean.  It then computes the standard deviate for each fusion value on this distribution (assumed normal), and selects the first one as "significant" if its t-value exceeds the 5% level.  So the null hypothesis is that the kth fusion value comes from the normal distribution of fusion values.

Moving Average Quality Control Rule
For each fusion k, the moving average quality control rule fits a linear trend to the first k-1 fusion values and then computes an expected value for the kth fusion value from the trend.

Evaluation
The original paper containing these rules was published by Prof. Dick Mojena in the Computer Journal, 1977.  Some further tests were completed by Mojena and Wishart, and presented at COMPSTAT 1980.    Their main conclusion was that the upper tail rule and moving average quality control rule performed creditably in tests, but a third rule - double exponential smoothing - did not do so well.
Although the paper presented at COMPSTAT 1980 reported tests using Ward's Method, in ClustanGraphics the tests are available for any hierarchical clustering fusion sequence.

References

Mojena, R (1977) Hierarchical grouping methods and stopping rules: an evaluation, Computer Journal, v. 20, 353-363.

Mojena, R, and Wishart, D (1980) Stopping rules for Ward's clustering method, COMPSTAT 1980 Proceedings, Physica-Verlag, 426-432.

A discussion also appears in: Wishart, D (1987), Clustan User Manual under Rules, pp 156-159.

Clustan - A Class Act © 1998 Clustan Ltd