Navigation

Operators and Keywords

Function List:

C++ API

Function File: y = linkage (d)
Function File: y = linkage (d, method)
Function File: y = linkage (x, method, metric)
Function File: y = linkage (x, method, arglist)

Produce a hierarchical clustering dendrogram

d is the dissimilarity matrix relative to n observations, formatted as a (n-1)*n/2x1 vector as produced by pdist. Alternatively, x contains data formatted for input to pdist, metric is a metric for pdist and arglist is a cell array containing arguments that are passed to pdist.

linkage starts by putting each observation into a singleton cluster and numbering those from 1 to n. Then it merges two clusters, chosen according to method, to create a new cluster numbered n+1, and so on until all observations are grouped into a single cluster numbered 2(n-1). Row k of the (m-1)x3 output matrix relates to cluster n+k: the first two columns are the numbers of the two component clusters and column 3 contains their distance.

method defines the way the distance between two clusters is computed and how they are recomputed when two clusters are merged:

"single" (default)

Distance between two clusters is the minimum distance between two elements belonging each to one cluster. Produces a cluster tree known as minimum spanning tree.

"complete"

Furthest distance between two elements belonging each to one cluster.

"average"

Unweighted pair group method with averaging (UPGMA). The mean distance between all pair of elements each belonging to one cluster.

"weighted"

Weighted pair group method with averaging (WPGMA). When two clusters A and B are joined together, the new distance to a cluster C is the mean between distances A-C and B-C.

"centroid"

Unweighted Pair-Group Method using Centroids (UPGMC). Assumes Euclidean metric. The distance between cluster centroids, each centroid being the center of mass of a cluster.

"median"

Weighted pair-group method using centroids (WPGMC). Assumes Euclidean metric. Distance between cluster centroids. When two clusters are joined together, the new centroid is the midpoint between the joined centroids.

"ward"

Ward’s sum of squared deviations about the group mean (ESS). Also known as minimum variance or inner squared distance. Assumes Euclidean metric. How much the moment of inertia of the merged cluster exceeds the sum of those of the individual clusters.

Reference Ward, J. H. Hierarchical Grouping to Optimize an Objective Function J. Am. Statist. Assoc. 1963, 58, 236-244, http://iv.slis.indiana.edu/sw/data/ward.pdf.

See also: pdist,squareform.

Package: statistics