Weighting Cases 

About Clustan
Cluster Analysis
User Support
What's New
White Papers
Contact Us

Most clustering packages do not allow you to assign differential weights to your cases prior to clustering them.  However, this can be important if your cases are already some form of aggregated units.

For example, suppose you are clustering customers, but you want to give your best customers more weight than the occasional or single purchase customer.  The ideal would be to weight each customer by the volume or value of his business with you.  This is easy with ClustanGraphics - just use Edit|Weights to enter or paste your case weights.  Case weights can also be read alongside a data matrix or proximity matrix.

Suppose that the weights of five cases are as follows:

Case 1

Case 2

Case 3

Case 4

Case 5






The effect of these weights when clustering proximities is equivalent to a cluster analysis on 10 cases, of which 3 cases identical to case 2, 2 cases are identical to case 3, and 3 cases are identical to case 4.  When the Euclidean Sum of Squares is computed, the points corresponding to cases 2, 3 and 4 would comprise 3, 2 and 3 cases each, while cases 1 and 5 would be treated as singleton cases.

In terms of the customer study, the weights might correspond to the number of transactions by each customer; thus we are, in effect, clustering the individual transactions.  A customer with a large number of transactions will have a heavier centroid in the analysis than one of the lighter customers.  The resulting classification should therefore give greater emphasis to the most frequent shoppers.

Another type of study can involve clustering administrative districts weighted by their populations.  This could be important in, say, identifying suitable districts in which to locate new stores or supermarkets.  The weight of each resulting cluster would correspond to the population of the districts in the cluster, and hence act as a surrogate for the cluster's potential total customer catchment.

A third example, which is quite important when clustering large datasets, is when the cases are themselves clusters.  This happens when you truncate a hierarchical cluster analysis to the last few fusions to obtain a summary tree.  The act of truncation creates a cluster model in which the base units are clusters.  Each cluster unit is represented jointly by a cluster mean and a weight, where the weight is the number of cases in the cluster; or, if the cases were assigned differential weights, the sum of the weights in the cluster.  If your cluster model does not reflect the different weights of each cluster, a summary tree obtained from it will not correspond to the latter stages of the larger tree from which it was obtained by truncation.