# Hierarchical Clustering¶

Hierarchical clustering algorithms build a dendrogram of nested clusters by repeatedly merging or splitting clusters.

Functions

`hclust`(D; linkage=:single)

Perform hierarchical clustering on distance matrix D with specified cluster linkage function.

Parameters: D – The pairwise distance matrix. `D[i,j]` is the distance between points `i` and `j`. linkage – A Symbol specifying how the distance between clusters (aka cluster linkage) is measured. It determines what clusters are merged on each iteration. Valid choices are:
• `:single`: use the minimum distance between any of the members
• `:average`: use the mean distance between any of the cluster’s members
• `:complete`: use the maximum distance between any of the members.
• `:ward`: the distance is the increase of the average squared distance of a point to its cluster centroid after merging the two clusters.
• `:ward_presquared`: same as `:ward`, but assumes that the distances in `D` are already squared.
The function returns an object of type Hclust with the fields
• `merges` the sequence of subtree merges. Leafs are indicated by negative numbers, the ids of non-trivial subtrees refer to the rows in the `merges` matrix and the elements of the `heights` vector.
• `heights` subtrees heights, i.e. the distances between left and right top branches of each subtree.
• `order` indices of points ordered such that there are no intersecting branches on the dendrogram plot. This ordering brings points of the same cluster close together.
• `linkage` the cluster linkage used.

Example:

```D = rand(1000, 1000)
D += D'  # symmetric distance matrix (optional)
`cutree`(result; [k=nothing], [h=nothing])
Parameters: result – Object of type `Hclust` holding results of a call to `hclust()`. k – Integer specifying the number of desired clusters. h – Real specifying the height at which to cut the tree.
If both k and h are specified, it’s guaranteed that the number of clusters is `≥ k` and their height `≤ h`.