distfun) — Function File: y = pdist (x,
distfun, distfunarg, ...) Return the distance between any two rows in x.
x is the matrix (n x m) to determine the distance between. If no distfun is given, then the 'euclidean' distance is assumed. distfun may be any of these or a function handle to a user defined function that takes two arguments distfun (u, V) where u is a the row (1 x m) that is having its distance taken relative to V (a p x m matrix).
The output vector, y, is (n - 1) * (n / 2) long where the distances are in the order [(1, 2); (1, 3); ...; (2, 3); ...; (n-1, n)].
Any additional arguments after the distfun are passed as distfun (u, V, distfunarg1, distfunarg2 ...).
Pre-defined distance functions are:
- ‘"euclidean"’
- Euclidean distance (default)
- ‘"seuclidean"’
- Standardized Euclidean distance. Each coordinate in the sum of squares is inverse weighted by the sample variance of that coordinate.
- ‘"mahalanobis"’
- Mahalanobis distance
- ‘"cityblock"’
- City Block metric (aka manhattan distance)
- ‘"minkowski"’
- Minkowski metric (with a default parameter 2)
- ‘"cosine"’
- One minus the cosine of the included angle between points (treated as vectors)
- ‘"correlation"’
- One minus the sample correlation between points (treated as sequences of values).
- ‘"spearman"’
- One minus the sample Spearman's rank correlation between observations, treated as sequences of values
- ‘"hamming"’
- Hamming distance, the percentage of coordinates that differ
- ‘"jaccard"’
- One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ
- ‘"chebychev"’
- Chebychev distance (maximum coordinate difference)
See also: cluster, squareform