pdist [statistics]
— Function File: y = pdist (x)
— Function File: y = pdist (x,

distfun) — Function File: y = pdist (x,

distfun, distfunarg, ...) Return the distance between any two rows in x.

x is the matrix (n x m) to determine the distance between. If no distfun is given, then the 'euclidean' distance is assumed. distfun may be any of these or a function handle to a user defined function that takes two arguments distfun (u, V) where u is a the row (1 x m) that is having its distance taken relative to V (a p x m matrix).

The output vector, y, is (n - 1) * (n / 2) long where the distances are in the order [(1, 2); (1, 3); ...; (2, 3); ...; (n-1, n)].

Any additional arguments after the distfun are passed as distfun (u, V, distfunarg1, distfunarg2 ...).

Pre-defined distance functions are:

"euclidean"
Euclidean distance (default)
"seuclidean"
Standardized Euclidean distance. Each coordinate in the sum of squares is inverse weighted by the sample variance of that coordinate.
"mahalanobis"
Mahalanobis distance
"cityblock"
City Block metric (aka manhattan distance)
"minkowski"
Minkowski metric (with a default parameter 2)
"cosine"
One minus the cosine of the included angle between points (treated as vectors)
"correlation"
One minus the sample correlation between points (treated as sequences of values).
"spearman"
One minus the sample Spearman's rank correlation between observations, treated as sequences of values
"hamming"
Hamming distance, the percentage of coordinates that differ
"jaccard"
One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ
"chebychev"
Chebychev distance (maximum coordinate difference)
See also: cluster, squareform