Variation of Information¶
Variation of information (also known as shared information distance) is a measure of the distance between two clusterings. It is devised based on mutual information, but it is a true metric, i.e. it satisfies symmetry and triangle inequality.
References:
Meila, Marina (2003). Comparing Clusterings by the Variation of Information. Learning Theory and Kernel Machines: 173–187.
This package provides the varinfo function that implements this metric:
-
varinfo(k1, a1, k2, a2)¶ Compute the variation of information between two assignments.
Parameters: - k1 – The number of clusters in the first clustering.
- a1 – The assignment vector for the first clustering.
- k2 – The number of clusters in the second clustering.
- a2 – The assignment vector for the second clustering.
Returns: the value of variation of information.
-
varinfo(R, k0, a0) This method takes
R, an instance ofClusteringResult, as input, and computes the variation of information between its corresponding clustering with one given by(k0, a0), wherek0is the number of clusters in the other clustering, whilea0is the corresponding assignment vector.
-
varinfo(R1, R2) This method takes
R1andR2(both are instances ofClusteringResult) and computes the variation of information between them.