Classify New Cases allows for missing values if present, and can be used withmixed data types - continuous, binary, ordinal or nominal variables. Such variables are typically found in complex survey questionnaires, for which it is generally not necessary to carry out any tedious pre-processing to transform such data. ClustanGraphics takes care of these transformations internally, leaving you more time to focus on the post-clustering analysis.
In the example shown below, we are classifying new cases at the 6 cluster level in a tree generated by Cluster Data for 500 cases. There were 4 variables in the original dataset, so we need to specify 4 values for each new case. We could enter the new cases interactively as directed. But since we have quite a few to enter we put them on an input file which has already been specified by clicking "Input File".
We also specified an output file, and selected the model's results to be saved for further analysis. The Classify screen shows the data for case 11 from our input file, and its proximities to each of the last 6 clusters in our tree. There's a neat bar chart which helps quickly confirm that cluster 2 is the best fit for case 11 at a proximity of 3.24, and we can also see that the case is next closest to cluster 4 at a distance of 9.38.
If we have a lot of data on our input file, we can click "Run Model". This takes the model out of interactive mode, and simply runs it for the rest of the input file, posting the classifying results to the output file. The resulting output file looks like this:
Case Cluster Distances
We're showing only 5 rows here from what is actually a much longer output file. Case 11 is seen to be nearest to cluster 2 at proximity 3.24. In addition, we chose to write all the proximities between the cases and the 6 clusters to the output file. Specifying the output file for Classify New Cases is quite flexibly done, as shown below:
Once the output file has been created, it can be easily copied into a spreadsheet for further analysis, as below:
We opened the Classify output file "ClassifyResults.txt" directly in Excel and then sorted the cases on Cluster and Distance. To save space, we have truncated the table to 5 of the 50 rows, including case 11 which is shown as classified in cluster 2 at a distance 3.24. You're now set to do any further analysis you want on the results.
Classify Cases can be run on any size of dataset comprising either standard continuous data ormixed data types . See banking for an example of a customer segmentation study on 4.2m cases.