A physically meaningful equation involving $n$ dimensioned variables can be rewritten in terms of $p=n-k$ dimensionless parameters $\pi_1, \pi_2, \ldots, \pi_p$ where $k$ is the number of physical dimensions involved.
Construct a square whose circumference equals its area.
Most people would solve $4x = x^2 \rightarrow x=4$.
However, $4$ is now a dimensioned quantity -- four of what?
A dimensionally correct analysis would introduce a unit of length $u$:
$$4(x/u)=(x/u)^2 \rightarrow x=4u$$
Suppose $p$, $q$ and $r$ are probabilities. What can you say about the following expression?
$$p+qr$$
What about this one?
$$-\log q - \log r$$
We have strong intuitions about probabilities, bits, etc. However, dimensional analysis can't help here!
Standard treatment is that dimensions can cancel.
Transcendental functions ($\exp$, $\log$, $\sin$ etc.) require dimensionless and unitless quantities.
$\log(10 kg) = \log(10) + \log(kg)$ doesn't make sense.
Rather than having dimensions and units cancel, why not carry them around as function or type signatures?
tan :: xlength -> ylength -> real
tan xl yl = yl/xl
ML quantities can have more than one signature, as we will see later.
Early proposal from a psychologist (Stevens, 1946), still influential although somewhat rigid and limited.
Scale type | Description | Transformations |
---|---|---|
Nominal | no order, no unit | permutation |
Ordinal | order, no unit | monotone |
Interval | can choose unit and zero | affine |
Ratio | fixed zero, can choose unit | linear |
The appropriate scale type is determined by the transformation furthest down the list which is still "meaningful".
Scale type | Statistics |
---|---|
Nominal | mode |
Ordinal | median, quantile, range |
Interval | arithmetic mean, variance |
Ratio | geometric mean, coefficient of variation |
Each scale type inherits statistics from levels above.
Such scales abound in machine learning!
Mosteller and Tukey (1977): Names, Grades (e.g., beginner, intermediate, advanced), Ranks (1, 2, ...), Counted fractions (e.g., percentages), Counts (non-negative integers), Amounts (non-negative real numbers), Balances (unbounded, positive or negative values).
Chrisman (1998): Nominal, Graded membership (e.g., fuzzy sets), Ordinal, Interval, Log-interval, Extensive ratio, Cyclical ratio (e.g., angles or time of day) Derived ratio, Counts, Absolute (e.g., probabilities).
Measurements are relevant in machine learning and AI for at least two reasons:
If I split a data set in two or more parts, is a classifier's accuracy on the entire data set equal to the average* of the accuracies on the separate parts?
Yes -- provided the parts are of equal size (e.g., cross-validation).
What about per-class recall ( = true positive rate)?
Yes -- provided the parts have the same class distribution (e.g., stratified CV).
*To be precise: the arithmetic mean.
Is a classifier's precision on the entire data set equal to the average of the precisions on the parts?
IT IS NOT!
Unless the classifier's predictions are equally distributed over the classes on each part, which is neither likely nor under the experimenter's control.
The same applies a fortiori to F-score, which aggregates recall and precision.
Common definition: $F = \frac{2prec\cdot rec}{prec+rec}$.
As harmonic mean: $\frac{1}{F} = \left(\frac{1}{rec} + \frac{1}{prec}\right)/2$.
Peter's preferred definition: $$F = \frac{TP}{TP + {\color{red}{\frac{FN+FP}{2}}}} = \frac{2TP}{2TP + FN+FP}$$
Flach, P. and Kull, M., 2015. Precision-recall-gain curves: PR analysis done right. NIPS 2015.
Use a trained IRT model to evaluate a new classifier on a small number of datasets.
Ultimately, empirical ML needs to make causal statements:
Algorithm A outperformed algorithm B because the classes were highly imbalanced.
Proper treatment of performance evaluation in machine learning and AI requires a sophisticated measurement framework with the following components:
Part of this work was funded through a project with the Alan Turing Institute; papers, code and videos can be accessed here.
Many thanks to Hao Song, the Research Associate on the project; and collaborators Yu Chen, Tom Diethe, Jose Hernandez-Orallo, Conor Houghton, Meelis Kull, Paul-Gauthier Noe, Miquel Perello-Nieto, Ricardo Prudencio, Raul Santos-Rodriguez, Telmo Silva Filho, Kacper Sokol, and many others.