Frequently Asked Questions about ClustanGraphics:

General FAQs 

Q. 

I'm more interested in cluster analysis functionality than nice
graphics. Can I use ClustanGraphics for serious clustering applications? 
A. 

Yes. ClustanGraphics offers 11 methods of hierarchical
cluster analysis, plus kmeans analysis, FocalPoint clustering and classify new cases. You can use it in data mining
, and you don't need any other cluster analysis software. 

Q. 

Does ClustanGraphics include features to help me interpret a cluster analysis? 
A. 

Yes. Reorder tree provides a case order which should
generally make sense. Best Cut tells you which tree sections are significant. Rightclick on any cluster to display its profile, or pivot to display any variable's profile across all
the clusters. Scatterplots shows how your clusters are distributed on any two explanatory variables. Cluster exemplars identifies the most typical cases within all the clusters.
Navigate Tree displays ttests on cluster means, to identify the most discriminating variables. And you can copy cluster profile and membership tables to other applications such as
Excel or Lotus for further evaluation. 

Q. 

Can I mix binary, nominal, ordinal and continuous variables in ClustanGraphics? 
A. 

Yes, in hierarchical cluster analysis and kmeans. The
Data Types
dialogue in ClustanGraphics allows you to specify the types of variables and how they are to be transformed, weighted or masked. These unique features provide much more flexibility when analyzing databases or surveys, where binary, nominal and ordinal variables are as common as continuous variables.



Q. 

Will ClustanGraphics handle missing values? 
A. 

Yes, in most procedures. You can read a data matrix
which contains missing values and compute proximities from it, or read a proximity matrix which contains missing values. You can then obtain a hierarchical cluster analysis from
a proximity matrix containing missing values
, truncate the resulting tree, and obtain cluster profiles. kmeans analysis, outlier analysis, and
truncation of cluster models all allow for missing values. When you classify new cases, you can present incomplete data and ClustanGraphics will find the best cluster to fit the
values given. You can also use this interactively to progressively refine a classification on a new case as more data becomes available. Missing values have not yet been
implemented in FocalPoint clustering. 

Q. 


How do I input my data into ClustanGraphics? 
A. 


Text file, spreadsheet or paste. All of these methods are fast  for example, reading an
Excel file of 40,000 rows takes 3 seconds. However, if you are working with very large data mining applications, we can also read binary data.


Q. 

How do I transfer a ClustanGraphics tree or scatterplot into my report,
presentation or webpage? 
A. 

Just copy and paste. It's best to adjust the size of your tree using the Modify palette
or Display Tree dialogue so that it fits neatly into your document before you copy it. You can change the dimensions, labelling font, size and colours to match your document's
style exactly. There's a neat example here
. 

Q. 

How can I check whether I have an optimum cluster analysis? 
A. 

Look for small values for the Euclidean Sum of Squares from kmeans or
hierarchical cluster analysis. ClustanGraphics also provides a significance test for the best number of clusters. Most other programs don't report the criterion values
correctly, or are not guaranteed to converge to a stable minimum. 

Q. 

Can I transform or weight my observations? 
A. 

Yes. With ClustanGraphics you can assign differential weights to cases and/or
variables. Case weighting is useful where the cases are aggregations, such as geographical areas of varying population density or customers with differing transaction
volumes. Variable weights can be used to promote the most discriminating variables. Transformations are also available in ClustanGraphics; for example, zscores, standardize
variable range and rescale data matrix. 

Q. 

Can I try ClustanGraphics before purchasing it? 
A. 

Yes. For corporate and academic users, we offer a 30day
trial. Simply complete and submit our online order form, and we will send you ClustanGraphics on CD with a copy of the ClustanGraphics Primer and an invoice. If you
return it within 30 days, your invoice will be cancelled. 

Top of Page 



System Requirements 

Q. 

Can I run ClustanGraphics on my PC? 
A. 

ClustanGraphics runs under Microsoft Windows or NT. It will
therefore run on any PC with Windows 95, 98, 2000, NT, ME or XP. ClustanGraphics also runs on Macintosh computers equipped with a PC emulator. 

Q. 

Does my PC need any special hardware or software? 
A. 

No. A standard 486 or Pentium PC is fine for small to
mediumsized applications. But for very
large clustering applications it helps if your PC has extra RAM (e.g. 64MB or 128MB).


Top of Page 



Data Mining 

Q. 

What's the largest hierarchical cluster analysis possible using ClustanGraphics? 
A. 

If your data matrix contains up to 120,000 rows (cases) then
ClustanGraphics can complete Ward's Method or Average Linkage (UPGMA) with Euclidean distance. It is not necessary to calculate a proximity matrix, because clustering is
performed directly on the data matrix using our unique, proprietory method. 

Q. 

What if I need to compute a proximity matrix? 
A. 

Then the maximum which can be handled by ClustanGraphics is
about 10,000 cases. This is because a data matrix of 10,000 rows produces a proximity matrix of n^{2}, or 100m cells, which requires 400MB storage. It's just
possible with a fast Pentium PC, plenty of RAM and disk space. More practically, a data matrix of 5,000 rows requires a 100MB proximity matrix which is quite feasible using
ClustanGraphics. 

Q. 

How should I tackle a large data mining application using ClustanGraphics? 
A. 

You can
cluster a million cases
using kmeans, if your PC has enough RAM. For example, a data matrix of 1 million rows and 10 columns requires 40MB storage, so it's quite feasible to use ClustanGraphics with a Pentium PC of 96MB or more.


Q. 

Is there another way to cluster a very large data set? 
A. 

Yes. We recommend that you select a sample from your
data and obtain a cluster analysis using hierarchical cluster analysis or kmeans. From your cluster profiles, select a cluster level which seems actionable and save it as a
ClustanGraphics model. You can now use the model to classify new cases. This involves reading new case data from a sequential file and finding the clusterofbestfit in
the model for each new case. As this involves a single pass through the new data, it is possible to classify any number of cases. There's no limit. 

Top of Page
