About Clustan
Cluster Analysis
User Support
What's New
White Papers
Contact Us

Auto Script Feature

We started in 1968 with a batch program running on an IBM 1620, and progressed in 1997 to a Windows version with a full graphical user interface.  Now we have turned full circle, and will shortly provide an Auto Script feature for ClustanGraphics, to run a sequence of clustering steps as an automated batch program.  The difference is that you can set up your script simply by running ClustanGraphics normally, then edit it for automatic running whenever new data becomes available - thus it meets an essential need among our data mining clients.  Details here.

New Release - ClustanGraphics 5

ClustanGraphics5, our new release for Windows 95, 98, 2000, ME, XP and NT, was published in 2001. It was previewed at the 25th conference of the German Classification Society in Munich, March 2001, and has been reviewed by several testers to whom we wish to express our grateful thanks.

To preview some of its exciting new features, go to What's New , or check out the full list of ClustanGraphics Features.

If you have an interest in data mining or large survey analysis, spare the time to visit Clustering Large Datasets.  You'll discover how it's possible to cluster many thousands of cases hierarchically by Ward's Method or Average Linkage in seconds.  And in Classify Cases we show you how you can apply your hierarchical tree model to classify an unlimited number of new cases, including allowance for missing values.

To order ClustanGraphics5 on-line click ORDER now!

ClustanGraphics Primer: A Guide to Cluster Analysis

A 60-page ClustanGraphics Primer has been published to accompany ClustanGraphics5.  Its purpose is to introduce cluster analysis to beginners and to serve as a user manual for ClustanGraphics5.  A copy is supplied free with ClustanGraphics5, and further copies can be ordered for use with network and site licenses.

To order ClustanGraphics5 on-line click ORDER now!

Recent Publication - Interface '98

A paper presented at Interface '98, hosted by the University of Minnesota, has been published in Computing Science and Statistics, 30, 257-264.  It describes our fast procedure for clustering large datasets , suitable for data mining and large survey applications.  The title and abstract are as follows:

Efficient hierarchical cluster analysis for data mining and knowledge discovery

David Wishart

Abstract:    The paper compares hierarchical cluster analysis with decision trees for data mining and knowledge discovery applications. It is argued that "top-down" binary decision trees can force orthogonal partitions on to data whose shape due to correlated variables might indicate that a non-orthogonal partition is more appropriate, whereas "bottom-up" hierarchical cluster analysis is better at recovering the true shape. A fast algorithm is described for Ward's method, capable of constructing clustering trees for thousands of observations and therefore suitable for KDD applications. A hybrid clustering method is proposed which combines the best features of Ward's method and single linkage (nearest neighbor) to resolve the shape of clusters having non-zero covariance. The use of an agglomerative tree for identification is discussed, and the methods are illustrated by reference to the H-R diagram of visual stars. Finally, implementation for Windows is described.

For a summary of the paper go to Clustering versus Decision Trees.

Recent Publication - Springer '99

Our paper presented at the German Classification Society, TU-Dresden in March 1998 (GfKl '98) has been published in: Studies in Classification, Data Analysis and Knowledge Organization, Gaul, W., Locarek-Junge, H. (Eds), Classification in the Information Age, Springer, 1999, pp 268-275.  The title and abstract are as follows:

ClustanGraphics3:  Interactive Graphics for Cluster Analysis

David Wishart

Abstract : ClustanGraphics3 is a new interactive program for hierarchical cluster analysis.  It can display shaded representations of proximity matrices, dendrograms and scatterplots for 11 clustering methods, with an intuitive user interface and new optimization features.  Algorithms are proposed which optimize the rank correlation of the proximity matrix by seriation, compute cluster exemplars and truncate a large dendrogram and proximity matrix.  ClustanGraphics3 is illustrated by a market segmentation study for automobiles and a taxonomy of 20 species based on the amino acids in their protein cytochrome-c molecules.  The paper concludes with an overview.

The methodology developed for optimally re-ordering a tree and proximity matrix and for cluster description is illustrated in the pages on Reorder Tree and Cluster Exemplars.  Some examples of the Proteins case study used in this paper also appear in the Cluster Proximity Matrix and Display Proximity Matrix pages, and an optimally ordered tree appears in ClustanGraphics Preview.   Clustering Large Datasets illustrates the truncation of a hierarchical cluster analysis of 40,000 cases to 50 clusters.

Invited Paper - ISI '99

A paper was recently presented on ClustanGraphics at the 52nd Session of the International Statistical Institute, to be held at Helsinki, Finland, August 10-18, 1999.  It was included in the theme: Statistical Aspects of Data Mining and Knowledge Discovery in Databases.  The title and abstract are as follows:

Clustering Methods for Large Data Problems

David Wishart

Abstract: The paper reviews the use of clustering methods in large-scale data mining and knowledge discovery applications, and compares cluster analysis to decision trees as an efficient means of organizing databases.  Current applications include customer segmentation on a corporate database of over 2m accounts, and genotype species identification from DNA arrays of over 30,000 gene chip characters.  ClustanGraphics3 was demonstrated with examples of thousands of cases or variables.

Some brief details of the general approach are at Clustering versus Decision Trees.  A 4-page abstract can be downloaded from isi_99 [100k zip].

The abstract of this paper was published in the ISI Bulletin for 1999.

Classifying Single Malt Whiskies Using Cluster Analysis

David Wishart

Abstract :  Tasting notes in 10 recently published books on malt whisky and distillers' notes were coded and analysed for 84 single malt whiskies. Nearly 500 aromatic and taste descriptors were compiled and grouped into a standard flavour profile of 12 categories: Body, Sweetness, Smoky, Medicinal, Tobacco, Honey, Spicy, Winey, Nutty, Malty, Fruity and Floral.  A system of consensus coding was devised for the panel of 10 authors, and the 84 malts were clustered into 10 groups according to their flavour profiles. The paper discusses possibilities for further refinement of the classification and applications in the areas of product design, brand management and marketing.

Malt Whisky Tasting:  The meeting concludes with a tasting of selected single malt whiskies representative of the clusters discussed in the presentation. These have been generously contributed by the distillers and producers of Aberlour, Ardmore, Auchentoshan, Aultmore, Balvenie DoubleWood, Balvenie Founders Reserve, Ben Nevis, Benriach, Benromach Glenlivet, Bowmore, Bunnahabhain, Bushmills, Cutty Sark, Dalmore, Deanston, Edradour, Glendronach, Glenfarclas, Glenfiddich Solera Reserve, Glenfiddich Special Reserve, Glen Garioch, Glengoyne, Glen Moray, Glenrothes, Glenturret, Gordon & MacPhail, Highland Park, Isle of Jura, Laphroaig, Ledaig, Longmorn, Macallan, Old Pulteney, Royal Brackla, Scapa, Strathisla, Tobermory, Tomatin and Tomintoul.

Venues : This talk has been given at PADD '98 (Data Mining), Biometrics and British Classification Society, Scotch Malt Whisky Society, Royal Statistical Society, Royal Society of Arts, British Computer Society, Classification Society of North America, International Federation of Classification Societies, Spirit of Speyside Whisky Festival, WhiskyShip Switzerland, and at seminars organised by the Universities of St. Andrews, Napier and Edinburgh.  Details here.