Classify uses a tree traversal method to find, at each branch of the tree, the cluster of best fit. It starts from the top of the tree, or the extreme right if viewing in ClustanGraphics, and compares a new case with the two largest clusters; having identified the best fit of these two, it steps down to the next fusion level and compares the new case with those two clusters; and so on, until it reaches the base of the tree, whereupon it identifies a possible nearest neighbour. This is the single case in the original tree to which the new case is most similar. In the example shown above, Classify traces the route through the tree coloured red. The clusters examined in the course of this search are marked by dots, of which those coloured blue are those found to be the best fit at each fusion level examined. The single red dot at the base of the tree designates the nearest neighbour. In this example, 10 comparisons were needed to find the nearest neighbour in a tree containing 36 cases. Classify can run either interactively, or in batch mode. When used interactively, the user can supply partial information on a new case to find a preliminary identification (or diagnosis) which, in turn, can suggest what further data should be collected. Our retail banking case study describes a real application of Cluster and Classify in which we formed a tree with 16 000 cases taken from a corporate database and then classified all 4m of the bank's customers using Classify. A tree containing 16 000 cases requires about 25-30 comparisons to identify each new case. For further details, please see the Clustan User Manual.. |