ClustanGraphics allows you to run very powerful clustering algorithms on different data
types with or without missing values and differential case or variable weighting. Having read your data, either specify your variable types using an Auto Script, or select Edit/Data Types and specify them interactively using the following dialogue:
The example shown here illustrates four types of variables allowed in ClustanGraphics - binary, nominal, ordinal
and continuous, and two data transformations - range or z-scores. These apply as follows:
Binary Two codes other than missing, the higher code signifying "yes" or "present", the lower code
signifying "no" or "absent" (e.g. CreditAllowed, meaning whether the client has credit terms). Nominal
Integer codes having no logical numerical order (e.g. AccountType or ClientSector). Ordinal
Integer codes having a logical numerical order (e.g. VolumeLevel, by band). Continuous
Wide range of numerical values on a continuous or semi-continuous scale (e.g. InvoiceValue, or the actual value of the current contract).
To When you have completed a cluster analysis with mixed data types, the results are easily and flexibly
presented in our cluster model dialogue, shown here. Data Types On first entry, ClustanGraphics examines your data and tries to interpret the type of each variable according to whether the values are integers
and their frequencies. This may be correct; for example, if all your variables are binary then they should be interpreted as binary by having only two possible values. If you have nominal or ordinal variables, they
will be interpreted as nominal - you should therefore change the type of any such variable that is ordinal. To do this, click on the type cell and select from the drop-down list (right). Variable Transformations ClustanGraphics allows you to transform ordinal or continuous variables. The transformation options are none, range or z-scores. Range divides each value by the range of valid values, so that the transformed values range between
zero and 1. z-scores transforms the values so that they have a mean of zero and a standard deviation of 1. To specify the transformation of any variable, click on the variable transform
cell and select from the drop-down list (right). More details of data transformations are here
. Transformations are not available for binary or nominal variables. A binary variable is stored as a
present/absent score for each case (e.g. CreditAllowed is either true or false). Liikewise, a nominal variable is stored as a present/absent score for each category represented by an integer code (e.g. ClientSector=5 is held
as true for sector 5 and false for all other sector codes). Variable Weights With ClustanGraphics you can have different weights for each variable. The standard default is a weight of 1, so that
all variables have equal weight. If you want to give some variables more emphasis than others you can specify differential variable weights. To do this, click the variable weight cell and type a new weight value (right).
Your current choice of weights can also be reviewed and changed in the Edit/Weights dialogue, on the Edit menu. Masking Variables If you specify a weight of zero, the variable will be masked from the cluster analysis. In this case, the Edit/Data Types dialogue will show the variable as
masked, and its entries will be grayed (right). This is helpful if you want to carry background variables that are "inactive", that is not to be used for clustering but are nevertheless to be interpreted in cluster profiling. Variable Names The Edit/Data Types dialogue allows you to change the names of variables.
Simply click on a variable's name and edit it in situ (right). Your current choice of variable names can also be reviewed and changed in the
Edit/Labels dialogue, on the Edit menu. Variable Summaries
If you point the cursor at any variable and click the right mouse button, a summary of the current parameters for that variable will be displayed. This
helps you check that you have selected the correct type and transformation for the variable (right).
You can display a summary table for all your variables, by clicking the Summary button. An abbreviated table of Data Types specifications can be printed by clicking the Print button. Confirming Data Types
When you click OK in the Edit/Data Types dialogue, you will be asked whether you wish the changed specifications to be confirmed. At this point you can, if you wish, revert to the type settings previously recorded; or you
can update to the new settings entered into the dialogue. Don't forget to save your ClustanGraphics file so that your changes will be correctly reproduced when you next open your file.
You are now ready to run a cluster analysis on mixed data types. The current options are
hierarchical cluster analysis using
Compute Proximities, Nearest Neighbours
, k-Means Analysis and Classify Cases
. For further details, please refer to the file DataTypes.doc which accompanies ClustanGraphics or view a worked example
of Gower's Similarity Coefficient with mixed data types here. Clustan - A Class Act
© 1998 Clustan Ltd. |