The Cluster Model dialogue gives abbreviated information about the model, and enables Cluster Members or Cluster Model Statistics to be selected....
The Model Statistics dialogue shows the range of statistics that can be displayed for the current cluster model, and allows the format of means and percentages to be controlled. Continuous and Ordinal variables can be displayed in their original or transformed scales.
Having selected the cluster model statistics that are required, simply click "Compute Statistics" and the results will be displayed. They can then be copied to a document or spreadsheet, for further reporting or analysis. Alternatively, you can write the results directly to a file, which can then be read by Word or Excel, or analyzed further by other software. An extract from the table is shown below: Most of the results are self-explanatory, e.g. means, standard deviations, minima, maxima, frequencies and percentages. It might be helpful to describe the following diagnostic statistics: P-ratio (binary variables): the percentage occurrence within a cluster divided by the percentage occurrence overall. In random sampling, a P-ratio of 1 is expected; hence a P-ratio greater than 1 indicates that the attribute is more prevalent within the cluster than overall, and conversely, a P-ratio of less than 1 indicates that the attribute is more absent within the cluster than overall. t-value (ordinal and continuous variables): a t-statistic that compares the cluster means with the overall mean. If X_{cj} is the mean of variable j within cluster c, X_{j} is the overall mean for variable j, and S_{j} is the overall standard deviation for variable j, then t-value = (X_{cj} – X_{j})/S_{j} The expected value of the t-value in random sampling is 0; that is to say that a cluster mean would not be expected to deviate significantly from the overall mean if the cluster values have been randomly chosen. Any large positive (or negative) t-value indicates that the cluster mean is substantially higher (or lower) than the overall mean for that variable. F-ratio (ordinal and continuous variables) : the ratio of the variance within a cluster divided by the variance overall. If F_{cj} is the variance of variable j within cluster c, and F_{j} is the overall variance for variable j, then F-ratio = V_{cj }/V_{j } Note that the variance is simply the square of the standard deviation. The expected value of the F-ratio in random sampling is 1; that is to say that a cluster variance would not be expected to deviate significantly from the overall variance if the cluster values have been randomly chosen. Any small F-ratio indicates that the variable has comparitively low variation within the cluster and is therefore a good diagnostic variable for the cluster. An example of a statistics table for a cluster model obtained by k-means analysis with mixed data types is shown below: |