Cluster Procedure 

About Clustan
Cluster Analysis
User Support
What's New
White Papers
Contact Us
Clustan's procedure Cluster produces a hierarchical cluster analysis directly from a data matrix.  Unlike most other packages, it's not necessary to calculate a proximity matrix as an intermediate step. All Cluster requires is the ability to store the data matrix in memory, which is possible for large surveys when the number of variables is reasonably small.

If you're using decision trees for segmentation, you might like to have a look at our critique Clustering versus Decision Trees, where we show how the decision tree approach can produce very simplistic segmentation compared with hierarchical cluster analysis.

Cluster can be used to complete hierarchical cluster analyses with very large surveys.  For example, our banking study involved 16 000 cases and we anticipate that even larger studies are feasible.  Because cluster centres and statistics are computed directly from the data matrix, other unique advantages follow:

  • Variables and cases can be assigned weights
  • Missing values can be treated properly, without bias
  • Mixed data types can be treated properly, without bias
  • The tree can be used for identification of new cases

Our procedure Classify also links with Cluster to identify new cases by reference to the resulting tree.  Our retail banking case study describes a real application of Cluster and Classify in which we formed a tree with 16,000 cases taken from a corporate database and then classified all 4m of the bank's customers.

For further details, please see our Cluster Tutorial and the Clustan User Manual.