Cluster Keys was introduced into ClustanGraphics. This is a monothetic or polythetic divisive clustering tool that finds the best split on any variable at each stage of a hierarchical divisive clustering process. Details here. It works with continuous, ordinal and binary data, but not yet with nominal variables.
The possibility to save a cluster model by sectioning any tree obtained by hierarchical cluster analysis was added to the Tree menu. This enables diagnostic statistics to be obtained in a new format, especially useful when handling mixed data types.
A new statistics table was introduced for k-means analysis when run with mixed data types . This can also be used to analyze the cluster diagnostics for any saved cluster model.
Mixed data types has been extended for use with Classify Cases. Four types of variables can be defined: binary, nominal, ordinal and continuous.
A coefficient scale and coefficient title have been added to the tree plotted for all hierarchical clustering methods and k-means tree . The values displayed on the coefficient scale, number of increments and format can all be controlled in tree settings. Cluster labels can be displayed on scatterplots of variables or MDS dimensions.
An article was written for the forthcoming Encyclopedia of Statistics in Behavioral Science, Wiley (in press) on the subject of the best number of clusters. In the preparation of this article an improved t-test was introduced into Best Cut, taking account of the exact t-statistic given the number of degrees of freedom. This has the effect of modifying the best number of clusters selected with smaller data sets, where degrees of freedom affect the t-statistic.
A k-means tree was added to FocalPoint.
PCA (Principal Components Analysis) was added.
The Auto Script feature was extended to include editing data types and k-means analysis.
ClustanMDS implements multidimensional scaling on any proximity matrix. It finds new continuous variables corresponding to the proximity matrix, allowing the relationships between cases and clusters to be displayed on a cluster scatterplot , to aid graphical visualization of cluster analysis results.
Auto Script has been extended to allow for calculating proximities, clustering proximities, tree validation and cluster profiles. It is therefore now possible to run Auto Script in the background, using bootstrap validation to find the best partition in a hierarchical cluster analysis, and then save the cluster membership and cluster profiles tables.
File/New Data was modified for reading very large data sets. Read buffer sizes were increased, which resulted in a reduction in the time required to read large files of several MBytes in size. ClustanGraphics can now read files that cannot be created in oyher programs, such as Microsoft Excel, because of size constraints.
k-Means Cluster Model Tree provides an easy method of summarizing a k-means cluster solution by a tree. It can show how the k clusters resulting from a k-means analysis combine hierarchically down to 1 cluster, and also how the members of each of the k clusters combine to form the clusters.
Validate Tree compares a hierarchical agglomerative cluster analysis with a series of random trials on the given data or proximities, to test whether the resulting tree is significantly different from random and hence identify the best number of clusters.
Clustan Wizard
generates a standard hierarchical agglomerative cluster analysis from an Excel spreadsheet in one very simple dialogue...
Copy Sections allows a large image to be split into sections for inclusion in a report over several pages. You have complete control over the number of sections wide, or tall, or the dimensions of each section. There is a bleed so that boundary details are not severed at the section edges. Transpose Data reads and transposes the data matrix, so the rows become variables and the columns are cases. This is useful in genomics, where DNA arrays are typically very long and are usually presented row-wise by DNA chips.
ClustanGraphics5 offers the following new features compared to previous versions: Direct data clustering can now handle missing values and differential case or variable weights. Our fast, proprietory algorithm has been re-tuned and now runs even faster, with improved precision. Mixed data types can be specified and used in computing proximities, nearest neighbour analysis, hierarchical cluster analysis or k-means. Four types of variables can be defined: binary, nominal, ordinal and continuous. We have vastly improved our k-means analysis, so that it can run on a million cases , or more; is guaranteed to converge; handles mixed data types; detects outliers; and handles missing values. Read our critical appraisal, and download a white paper. Variables can be masked from clustering, but available in cluster profiling for background cluster interpretation and description. Nearest neighbour analysis is available for all types of proximity matrices. Contiguity constraints can be specified for hierarchical and k-means clustering. These can be used in geo-demographic analysis to form cluster regions that are contiguous, or to partition time series into contiguous time periods. Gower's General Similarity Coefficient and several General Distance Coefficients are available for use with mixed data types. Binary similarity coefficients are provided for data of binary or dichotomous (present/absent) form. Proximities can be read in four different formats: lower triangular, upper triangular, square and list formats, with or without diagonal elements present. Clustering variables
is possible using a correlation matrix, covariance matrix, or other external method of comparing variables. This is a Reading Excel spreadsheets has been improved with faster reading and more options. View Cluster Model has been extended for mixed data types.
Missing values are allowing when reading or computing proximities, in clustering proximities, in outlier analysis, and in the associated graphics. FocalPoint Clustering is our new two-stage k-means procedure which finds and saves top solutions, allows outlier and intermediate deletion. Navigate Tree, a new graphic representation of a hierarchical cluster analysis, which displays variable means and t-tests. Data can be read from Excel spreadsheets and binary data files. Hierarchical cluster analysis can be presented in any tree order: just point-and-click. k-Means Analysis starts from cluster model, tree partition, seed points or random, with options for truncation, cluster membership, exemplars and statistics. Cluster Proximities extended for large datasets, e.g. 10,000 cases. To order ClustanGraphics on-line click . |