Measures of accuracy for classification


I just discovered a not so recent article providing an overview of measures of accuracy in predictive classification tasks. Specifically, this article focuses on the pros and cons of different measures of classification accuracy with a particular emphasis on percentage-, distance-, correlation-, and information-based indices for binary outcomes.

Here is the abstract:

Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., and Nielsen, H., Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics (2000) 16(5): 412-424.
We provide a unified overview of methods that currently are widely used to assess the accuracy of prediction algorithms, from raw percentages, quadratic error measures and other distances, and correlation coefficients, and to information theoretic measures such as relative entropy and mutual information. We briefly discuss the advantages and disadvantages of each approach. For classification tasks, we derive new learning algorithms for the design of prediction systems by directly optimising the correlation coefficient. We observe and prove several results relating sensitivity and specificity of optimal systems. While the principles are general, we illustrate the applicability on specific problems such as protein secondary structure and signal peptide prediction.

Let’s start by considering a two-way table looking like the one shown below


where M and D are predicted and true class membership (+/- means positive/negative instances in both cases). The corresponding cells sum to 1 and read as

  • TP (resp. TN) for true positive (resp. negative), the number of times a positive (resp.negative) instance is correctly classified;
  • FP (resp. FN) for false positive (resp. negative), the number of time a positive (resp.negative) instance is incorrectly classified.

The task is to derive a suitable way to summarize performance accuracy of a given classification algorithm based on these four numbers, which amounts to define a certain distance between M and D.

Percentage-based measures

The percentages of correct positive (PCP) and negative (PCN) predictions are known as the sensitivity and specificity; they are given by

\text{PCP}=\frac{\text{TP}}{\text{TP}+\text{FN}},\quad \text{PCN}=\frac{\text{TN}}{\text{TN}+\text{FP}}.

Of note, we have PCP = Pr(instance classified as positive | instance is truly positive), whereas in a statistical decision framework, the Type I error risk is α = Pr(instance classified as positive | instance is truly negative). The following Table summarizes the correspondence between decision theory and classification results.

As shown below, a correct rejection of H0 occurs with probability 1 − α and may be considered as test specificity.

Distance-based measures

Any Lp distance between M and D, defined as

L^p=\left[\sum_i |d_i-m_i|^p \right]^{1/p},

can be used, but usually these are the L1 (Hamming), L2 (Euclidean), or L (sup). In the binary case, the Lp distance reduces to (FN+FP)1/p. For the Hamming distance, it is worth noting that it is simply FP+FN, so that it is equivalent to a percentage measure. However, as it does not account for possible imbalance between positive vs. negative instances, it becomes poorly useful when this ratio moves away from 50:50. Also, in the binary case, the euclidean distance reduces to the Hamming distance.

Correlation-based measures

Let’s recall the classical Pearson’s correlation coefficient:

C=\sum_i\frac{(d_i-\bar d)(m_i-\bar m)}{\sigma_D\sigma_M}.

I learned that, for binary variables, it is also known as Matthews correlation coefficient, though it is also known as the Phi coefficient in the psychometric literature. Using vector notation, we have

C=\frac{(\mathbf D-\bar d\mathbf 1)(\mathbf M-\bar m\mathbf 1)}{\sqrt{(\mathbf D-\bar d\mathbf 1)^2}\sqrt{(\mathbf M-\bar m\mathbf 1)^2}}

As for other interesting references, I’ll suggest the following articles:

1. Ambroise, C and McLachlan, GJ. Selection bias in gene extraction on the basis of microarray gene-expression data, PNAS (2002) 99(10): 6562-6566.
2. Saeys, Y, Inza, I, and Larrañaga, P. A review of feature selection techniques in bioinformatics, Bioinformatics (2007) 23(19): 2507-2517.
3. Dougherty, ER, Hua J, and Sima, C. Performance of Feature Selection Methods, Current Genomics (2009) 10(6): 365–374.


Articles with the same tag(s):

Academic teaching
Data cleaning techniques
Data Science from Scratch
Writing a book
Stata for health researchers
R Graphs Cookbook
Bad Data
Data science at the command-line
Reproducible research with R
Twenty canonical questions in machine learning