Cluster Model Results 

Home
About Clustan
Cluster Analysis
Applications
ClustanGraphics
User Support
Clustan/PC
Orders
What's New
White Papers
Contact Us
Hierarchical cluster analysis, k-means, outlier analysis and classify cases are all possible with mixed data types and missing values.  On completion of a cluster analysis, the cluster model should be saved.  It can then be viewed in View/Current Model on the View dialogue.

The Cluster Model dialogue gives abbreviated information about the model, and enables Cluster Members or Cluster Model Statistics to be selected....

 

The Model Statistics dialogue shows the range of statistics that can be displayed for the current cluster model, and allows the format of means and percentages to be controlled.  Continuous and Ordinal variables can be displayed in their original or transformed scales.

Cluster Model Statistics dialogue, for mixed data types

Having selected the cluster model statistics that are required, simply click "Compute Statistics" and the results will be displayed.  They can then be copied to a document or spreadsheet, for further reporting or analysis.  Alternatively, you can write the results directly to a file, which can then be read by Word or Excel, or analyzed further by other software.  An extract from the table is shown below:

Most of the results are self-explanatory, e.g. means, standard deviations, minima, maxima, frequencies and percentages.  It might be helpful to describe the following diagnostic statistics:

P-ratio (binary variables): the percentage occurrence within a cluster divided by the percentage occurrence overall.  In random sampling, a P-ratio of 1 is expected; hence a P-ratio greater than 1 indicates that the attribute is more prevalent within the cluster than overall, and conversely, a P-ratio of less than 1 indicates that the attribute is more absent within the cluster than overall.

t-value (ordinal and continuous variables): a t-statistic that compares the cluster means with the overall mean.  If Xcj is the mean of variable j within cluster c, Xj is the overall mean for variable j, and Sj is the overall standard deviation for variable j, then t-value = (Xcj Xj)/Sj   The expected value of the t-value in random sampling is 0; that is to say that a cluster mean would not be expected to deviate significantly from the overall mean if the cluster values have been randomly chosen.  Any large positive (or negative) t-value indicates that the cluster mean is substantially higher (or lower) than the overall mean for that variable.

F-ratio (ordinal and continuous variables) : the ratio of the variance within a cluster divided by the variance overall.  If Fcj is the variance of variable j within cluster c, and Fj is the overall variance for variable j, then F-ratio = Vcj /Vj   Note that the variance is simply the square of the standard deviation.  The expected value of the F-ratio in random sampling is 1; that is to say that a cluster variance would not be expected to deviate significantly from the overall variance if the cluster values have been randomly chosen.  Any small F-ratio indicates that the variable has comparitively low variation within the cluster and is therefore a good diagnostic variable for the cluster.

An example of a statistics table for a cluster model obtained by k-means analysis with mixed data types is shown below:

Cluster Model Statistics table for mixed data types

Clustan - A Class Act © 1998 Clustan Ltd