Frequently Asked Questions 

Home
About Clustan
Cluster Analysis
Applications
ClustanGraphics
User Support
Clustan/PC
Orders
What's New
White Papers
Contact Us

General FAQs

 

Q.

 

I'm more interested in cluster analysis functionality than nice graphics.  Can I  use ClustanGraphics for serious clustering applications?

A.

 

Yes. ClustanGraphics offers 11 methods of hierarchical cluster analysis, plus k-means analysis, FocalPoint clustering and classify new cases. You can use it in data mining , and you don't need any other cluster analysis software.

 

Q.

 

Does ClustanGraphics include features to help me interpret a cluster analysis?

A.

 

Yes. Re-order tree provides a case order which should generally make sense. Best Cut tells you which tree sections are significant. Right-click on any cluster to display its profile, or pivot to display any variable's profile across all the clusters. Scatterplots shows how your clusters are distributed on any two explanatory variables. Cluster exemplars identifies the most typical cases within all the clusters. Navigate Tree displays t-tests on cluster means, to identify the most discriminating variables. And you can copy cluster profile and membership tables to other applications such as Excel or Lotus for further evaluation.

 

Q.

 

Can I mix binary, nominal, ordinal and continuous variables in ClustanGraphics?

A.

 

Yes, in hierarchical cluster analysis and k-means.  The Data Types dialogue in ClustanGraphics allows you to specify the types of variables and how they are to be transformed, weighted or masked.  These unique features provide much more flexibility when analyzing databases or surveys, where binary, nominal and ordinal variables are as common as continuous variables.

 

 

Q.

 

Will ClustanGraphics handle missing values?

A.

 

Yes, in most procedures.  You can read a data matrix which contains missing values and compute proximities from it, or read a proximity matrix which contains missing values.  You can then obtain a hierarchical cluster analysis from a proximity matrix containing missing values , truncate the resulting tree, and obtain cluster profiles.  k-means analysis, outlier analysis, and truncation of cluster models all allow for missing values.  When you classify new cases, you can present incomplete data and ClustanGraphics will find the best cluster to fit the values given. You can also use this interactively to progressively refine a classification on a new case as more data becomes available.  Missing values have not yet been implemented in FocalPoint clustering.

 

Q.

   

How do I input my data into ClustanGraphics?

A.

   

Text file, spreadsheet or paste.  All of these methods are fast - for example, reading an Excel file of 40,000 rows takes 3 seconds.  However, if you are working with very large data mining applications, we can also read binary data.

 

Q.

 

How do I transfer a ClustanGraphics tree or scatterplot into my report, presentation or web-page?

A.

 

Just copy and paste.  It's best to adjust the size of your tree using the Modify palette or Display Tree dialogue so that it fits neatly into your document before you copy it.  You can change the dimensions, labelling font, size and colours to match your document's style exactly.  There's a neat example here .

 

Q.

 

How can I check whether I have an optimum cluster analysis?

A.

 

Look for small values for the Euclidean Sum of Squares from k-means or hierarchical cluster analysis.  ClustanGraphics also provides a significance test for the best number of clusters.  Most other programs don't report the criterion values correctly, or are not guaranteed to converge to a stable minimum.

 

Q.

 

Can I transform or weight my observations?

A.

 

Yes.  With ClustanGraphics you can assign differential weights to cases and/or variables.  Case weighting is useful where the cases are aggregations, such as geographical areas of varying population density or customers with differing transaction volumes.  Variable weights can be used to promote the most discriminating variables. Transformations are also available in ClustanGraphics; for example, z-scores, standardize variable range and rescale data matrix.

 

Q.

 

Can I try ClustanGraphics before purchasing it?

A.

 

Yes.  For corporate and academic users, we offer a 30-day trial.  Simply complete and submit our on-line order form, and we will send you ClustanGraphics on CD with a copy of the ClustanGraphics Primer and an invoice.  If you return it within 30 days, your invoice will be cancelled.

 

Top of Page

 

 

  

System Requirements

 

Q.

 

Can I run ClustanGraphics on my PC?

A.

 

ClustanGraphics runs under Microsoft Windows or NT. It will therefore run on any PC with Windows 95, 98, 2000, NT, ME or XP.  ClustanGraphics also runs on Macintosh computers equipped with a PC emulator.

 

Q.

 

Does my PC need any special hardware or software?

A.

 

No.  A standard 486 or Pentium PC is fine for small to medium-sized applications.  But for very large clustering applications it helps if your PC has extra RAM (e.g. 64MB or 128MB).

 

Top of Page

 

 

  

Data Mining

 

Q.

 

What's the largest hierarchical cluster analysis possible using ClustanGraphics?

A.

 

If your data matrix contains up to 120,000 rows (cases) then ClustanGraphics can complete Ward's Method or Average Linkage (UPGMA) with Euclidean distance.  It is not necessary to calculate a proximity matrix, because clustering is performed directly on the data matrix using our unique, proprietory method.

 

Q.

 

What if I need to compute a proximity matrix?

A.

 

Then the maximum which can be handled by ClustanGraphics is about 10,000 cases.  This is because a data matrix of 10,000 rows produces a proximity matrix of n2, or 100m cells, which requires 400MB storage.  It's just possible with a fast Pentium PC, plenty of RAM and disk space.  More practically, a data matrix of 5,000 rows requires a 100MB proximity matrix which is quite feasible using ClustanGraphics.

 

Q.

 

How should I tackle a large data mining application using ClustanGraphics?

A.

 

You can cluster a million cases using k-means, if your PC has enough RAM.  For example, a data matrix of 1 million rows and 10 columns requires 40MB storage, so it's quite feasible to use ClustanGraphics with a Pentium PC of 96MB or more.

 

Q.

 

Is there another way to cluster a very large data set?

A.

 

Yes.  We recommend that you select a sample from your data and obtain a cluster analysis using hierarchical cluster analysis or k-means.  From your cluster profiles, select a cluster level which seems actionable and save it as a ClustanGraphics model.  You can now use the model to classify new cases.  This involves reading new case data from a sequential file and finding the cluster-of-best-fit in the model for each new case.  As this involves a single pass through the new data, it is possible to classify any number of cases.  There's no limit.

 

Top of Page