K-medoids¶
K-medoids is a clustering algorithm that seeks a subset of points out of a given set such that the total costs or distances between each point to the closest point in the chosen subset is minimal. This chosen subset of points are called medoids.
This package implements a K-means style algorithm instead of PAM, which is considered to be much more efficient and reliable. Particularly, the algorithm is implemented by the kmedoids
function.
-
kmedoids
(C, k; ...)¶ Performs K-medoids clustering based on a given cost matrix.
Parameters: - C – The cost matrix, where
C[i,j]
is the cost of assigning samplej
to the medoidi
. - k – The number of clusters.
This function returns an instance of
KmedoidsResult
, which is defined as follows:type KmedoidsResult{T} <: ClusteringResult medoids::Vector{Int} # indices of medoids (k) assignments::Vector{Int} # assignments (n) acosts::Vector{T} # costs of the resultant assignments (n) counts::Vector{Int} # number of samples assigned to each cluster (k) totalcost::Float64 # total assignment cost (i.e. objective) (k) iterations::Int # number of elapsed iterations converged::Bool # whether the procedure converged end
One may optionally specify some of the options through keyword arguments to control the algorithm:
name description default init
Initialization algorithm or initial medoids, which can be either of the following:
- a symbol indicating the name of seeding algorithm,
:rand
,:kmpp
, or:kmcen
(see Clustering Initialization) - an integer vector of length
k
that provides the indexes of initial seeds.
:kmpp
maxiter
Maximum number of iterations. 100
tol
Tolerable change of objective at convergence. 1.0e-6
display
The level of information to be displayed. (see Common Options) :none
- C – The cost matrix, where
-
kmedoids!(C, medoids, ...)
Performs K-medoids clustering based on a given cost matrix.
This function operates on an given set of medoids and updates it inplace.
Parameters: - C – The cost matrix, where
C[i,j]
is the cost of assigning samplej
to the medoidi
. - medoids – The vector of medoid indexes. The contents of
medoids
serve as the initial guess and will be overrided by the results.
This function returns an instance of
KmedoidsResult
.One may optionally specify some of the options through keyword arguments to control the algorithm:
name description default maxiter
Maximum number of iterations. 100
tol
Tolerable change of objective at convergence. 1.0e-6
display
The level of information to be displayed. (see Common Options) :none
- C – The cost matrix, where