UNISTAT - the ultimate Excel statistics add-in

8.1.1. Hierarchical Cluster Analyses

First, select the data columns to be analysed by clicking on [Variable] from the Variable Selection Dialogue. If the data is not a proximity matrix (if it is not square and symmetric) then another dialogue will appear allowing you to choose from six distance measures. This will not be available when you input a proximity matrix.

8.1.1.1. Distance Measures

Hierarchical Cluster Analyses

Euclid:

      Hierarchical Cluster Analyses

Squared Euclid:

      Hierarchical Cluster Analyses

Cosine:

      Hierarchical Cluster Analyses

Chebychev:

      Hierarchical Cluster Analyses

Block:

      Hierarchical Cluster Analyses

Power:

      Hierarchical Cluster Analyses

where the power terms p and r are supplied by the user.

8.1.1.2. Distance Matrix

After the distance matrix is computed, a dialogue containing six hierarchical clustering methods and a Distance matrix option will appear.

It is possible to select one of the methods and proceed immediately with the analysis, or select the last menu item to view or save the generated distance matrix. The Distance matrix option will not be available when you input a proximity matrix for analysis.

8.1.1.3. Hierarchical Methods

Hierarchical Cluster Analyses

All hierarchical methods apply the same algorithm, however, they differ in the way they compute the distance between two clusters.

First, the n(n - 1)/2 elements of the proximity matrix are sorted in ascending order. The nearest two points are formed into the first cluster. At the ith step the remaining points and the existing clusters are considered. Either the next two nearest points, or a cluster and a point, or two clusters are formed into a new cluster. This process is repeated until the number of clusters is reduced to a number provided by the user.

One of the following six hierarchical clustering methods can be selected, where dij is the dissimilarity between clusters i and j, ni = 1, i = 1, ..., n is a unity vector, Si = 0, i = 1, ..., n is a zero vector and indices t and r represent a new cluster and all other clusters respectively.

Average Between Groups:

      Compute an unweighted average distance between pairs belonging to two clusters. Update:

      Hierarchical Cluster Analyses

      Hierarchical Cluster Analyses

and select the minimum of:

      Hierarchical Cluster Analyses

Average Within Groups:

      Update:

      Hierarchical Cluster Analyses

      Hierarchical Cluster Analyses

      Hierarchical Cluster Analyses

      Hierarchical Cluster Analyses

Single Linkage:

      Select the smallest distance between pairs of elements in each cluster. Update:

      Hierarchical Cluster Analyses

Complete Linkage:

      Select the largest distance between pairs of elements in each cluster. Update:

      Hierarchical Cluster Analyses

Centroid:

      A cluster’s location is represented by the centroid of all points within the cluster. Update:

      Hierarchical Cluster Analyses

This method should be used only with squared Euclid distance.

Median:

      Compute the weighted average distance between pairs belonging to two clusters. Update:

      Hierarchical Cluster Analyses

This method should be used only with squared Euclid distance.

8.1.1.4. Hierarchical Cluster Output Options

Hierarchical Cluster Analyses

The following graphical and tabulated output options are provided.

History: This table shows the two clusters combined at each step and the distance between them. The cluster numbers displayed in the table are in fact Row Labels. If your data has no Row Labels these two columns will be blank.

      In History table, the newly formed cluster is given the label of the cluster in the left hand column. The numbers of the clusters combined at each step and their distance can be saved to the Data Processor.

Character Dendrogram: A dendrogram displays a visual summary of the clustering process, providing you with an understanding of the groups and proximities inherent in data. The order in which clusters are combined does not necessarily coincide with the order they are drawn on a dendrogram. The dendrogram procedure first rearranges the History table to produce an uncluttered tree diagram. The same tree structure can also be output in the form of a graph.

      The advantage of this form of output is in its ability to display all Row Labels without any cluttering. However, due its low resolution on the (horizontal) distance axis, some of the clusters which are too close to each other may not be distinguished.

Cluster Table: This is similar to the Cluster Graph option, however the results are displayed in the form of a table. You can enter the number of clusters to be displayed between 1 and the maximum row number. The number of cases in each cluster and their percentages can be saved to the Data Processor.

Cluster Membership: A table containing all cases displays which case belongs to which cluster. As in the Cluster Table option, the number of clusters to be formed can be selected. The cluster membership column can be saved to the Data Processor.

Hi-res Dendrogram: The high-resolution dendrogram is convenient when the number of rows in the data set does not exceed 100. The vertical axis represents the distance and the horizontal axis represents the clusters combined.

      The Edit option for the Hi-Res Dendrogram procedure enables you to change the colour and thickness of lines, as well as positions of the stems. The vertical lines representing the newly formed clusters may start from the mid point (the default), the right or the left corner of the line connecting the two old clusters.

Hierarchical Cluster Analyses

      The Row Labels will be displayed as X-axis labels. If these are too long, you can display them up and down or rotate the text by 90º or 270º.

Cluster Graph: Two and three-dimensional scatter diagrams can be displayed showing which data point belongs to which cluster. A dialogue provides controls for the appearance of the graph and the number of clusters to be displayed. Different clusters are represented by different letters in different colours. It is possible to select the font and the size of the letters from the EditXY Points menu.

Hierarchical Cluster Analyses

8.1.1.5. Hierarchical Cluster Example

Open MULTIVAR, select Statistics 2Cluster Analysis → Hierarchical Cluster Analyses and select Perf, Info, Verbexp and Age (C1 to C4) as [Variable]s. Select distance measure as Euclid and linking method as Average Between Groups. Select number of clusters as 3 and all the output options to obtain the following results:

Hierarchical Cluster Analysis

Variables Selected: Perf, Info, Verbexp, Age

Measure: Euclid, Method: Average Between Groups

History

Step

Combined1

Combined2

Distance

1

 1

 8

 4.6915

2

 2

 9

 9.4345

3

 1

 5

 9.5967

4

 1

 6

 10.8672

5

 3

 4

 12.5714

6

 1

 2

 16.1606

7

 3

 7

 19.2953

8

 1

 3

 26.9553

 

Character Dendrogram

             1+----------+                                                     

             8+----------+-----------+                                         

             5+----------------------+--+                                      

             6+-------------------------+-----------+                          

             2+---------------------+               |                          

             9+---------------------+---------------+-------------------------+

             3+-----------------------------+                                 |

             4+-----------------------------+---------------+                 |

             7+---------------------------------------------+-----------------+

 

Cluster Table

Cluster

Cases

Percentage

 1

 6

 66.7%

 2

 2

 22.2%

 3

 1

 11.1%

 

Cluster Membership

Observation

Cluster

1

 1

2

 1

3

 2

4

 2

5

 1

6

 1

7

 3

8

 1

9

 1

 

Hierarchical Cluster Analyses

 

Hierarchical Cluster Analyses