Data Type Specifications 

About Clustan
Cluster Analysis
User Support
What's New
White Papers
Contact Us
ClustanGraphics allows you to run very powerful clustering algorithms on different data types with or without missing values and differential case or variable weighting.  Having read your data, either specify your variable types using an Auto Script, or select Edit/Data Types and specify them interactively using the following dialogue:

ClustanGraphics Data Types Dialogue

The example shown here illustrates four types of variables allowed in ClustanGraphics - binary, nominal, ordinal and continuous, and two data transformations - range or z-scores.  These apply as follows:

    Binary  Two codes other than missing, the higher code signifying "yes" or "present", the lower code signifying "no" or "absent" (e.g. CreditAllowed, meaning whether the client has credit terms).

    Nominal   Integer codes having no logical numerical order (e.g. AccountType or ClientSector).

    Ordinal Integer codes having a logical numerical order (e.g. VolumeLevel, by band).

    Continuous Wide range of numerical values on a continuous or semi-continuous scale (e.g. InvoiceValue, or the actual value of the current contract).

To When you have completed a cluster analysis with mixed data types, the results are easily and flexibly presented in our cluster model dialogue, shown here.

Data TypesSpecifying Data Type
On first entry, ClustanGraphics examines your data and tries to interpret the type of each variable according to whether the values are integers and their frequencies.  This may be correct; for example, if all your variables are binary then they should be interpreted as binary by having only two possible values.  If you have nominal or ordinal variables, they will be interpreted as nominal - you should therefore change the type of any such variable that is ordinal.  To do this, click on the type cell and select from the drop-down list (right).

Variable Transformations
ClustanGraphics allows you to transform ordinal or continuous variables. Specifying Data Transformation The transformation options are none, range or z-scores.  Range divides each value by the range of valid values, so that the transformed values range between zero and 1.  z-scores transforms the values so that they have a mean of zero and a standard deviation of 1.  To specify the transformation of any variable, click on the variable transform cell and select from the drop-down list (right).  More details of data transformations are here .

Transformations are not available for binary or nominal variables.  A binary variable is stored as a present/absent score for each case (e.g. CreditAllowed is either true or false).  Liikewise, a nominal variable is stored as a present/absent score for each category represented by an integer code (e.g. ClientSector=5 is held as true for sector 5 and false for all other sector codes).

Variable WeightsChanging Variable Weight
With ClustanGraphics you can have different weights for each variable.  The standard default is a weight of 1, so that all variables have equal weight.  If you want to give some variables more emphasis than others you can specify differential variable weights.  To do this, click the variable weight cell and type a new weight value (right).

Your current choice of weights can also be reviewed and changed in the Edit/Weights dialogue, on the Edit menu.

Masking VariablesMasking Variables
If you specify a weight of zero, the variable will be masked from the cluster analysis.  In this case, the Edit/Data Types dialogue will show the variable as masked, and its entries will be grayed (right).  This is helpful if you want to carry background variables that are "inactive", that is not to be used for clustering but are nevertheless to be interpreted in cluster profiling.

Variable Changing Variable NameNames
The Edit/Data Types dialogue allows you to change the names of variables.  Simply click on a variable's name and edit it in situ (right).

Your current choice of variable names can also be reviewed and changed in the Edit/Labels dialogue, on the Edit menu.Data Types Summary

Variable Summaries
If you point the cursor at any variable and click the right mouse button, a summary of the current parameters for that variable will be displayed.  This helps you check that you have selected the correct type and transformation for the variable (right).

You can display a summary table for all your variables, by clicking the Summary button.  An abbreviated table of Data Types specifications can be printed by clicking the Print button.

Confirming Data Types
When you click OK in the Edit/Data Types dialogue, you will be asked whether you wish the changed specifications to be confirmed.  At this point you can, if you wish, revert to the type settings previously recorded; or you can update to the new settings entered into the dialogue.  Don't forget to save your ClustanGraphics file so that your changes will be correctly reproduced when you next open your file.

You are now ready to run a cluster analysis on mixed data types.  The current options are hierarchical cluster analysis using Compute Proximities, Nearest Neighbours , k-Means Analysis and Classify CasesFor further details, please refer to the file DataTypes.doc which accompanies ClustanGraphics or view a worked example of Gower's Similarity Coefficient with mixed data types here.

Clustan - A Class Act © 1998 Clustan Ltd.