📑 Learning Objectives

Have an overview of the different disciplines of Uncertainty Quantification.
Understand the foundational difference between Frequentist and Bayesian statistics.
Understand the different between objective and subjective undercertainty.
Know Kolmogorov's Axioms of probability.

An Brief Overview of UQ Methods

Uncertainty Quantification (UQ) can be many things, from simply doing model sensitivity analysis to solving statistical inverse problems. Here is a (by no means exhaustive) overview of some of the different mehtods used to perform UQ:

Forward Uncertainty Quantification - propagating uncertainty through some operation.
- (Quasi-) Monte Carlo Methods - using samples to propagate the uncertainty.
- Spectral methods such as Polynomial Chaos Expansion - using clever polynomials to propagate the uncertainty.
Inverse Uncertainty Quantification or Parameter Estimation - Estimating the posterior distribution of parameters given some data.
- Markov Chain Monte Carlo - using clever algorithms to sample from the posterior distribution.
- Variational Inference such ADVI - using a variational distribution to approximate the posterior distribution function.
Predictive Uncertainty Quantification - Quantifying the uncertainty of predictions.
- Gaussian Process Regression - using a distribution over functions to estimate the predictive uncertainty.
- Engineered methods - e.g. NOMU neural networks.
- Ad-Hoc methods - e.g. Monte Carlo Dropout.
- Approximate methods - e.g. Laplace Approximation.

in this course, we are mainly going to focus on the second branch, name inverse UQ, in particular direct method for estimating the posterior distribution and additionally a bit of Markov Chain Monte Carlo (MCMC), since the implementation of the simplest algorithm, the Metropolis-Hastings algorithm, is straightforward.

However, what all of the above approaches have in common is that they operate with propbability distributions. Our first port of call is therefore a slightly deeper dive into the meaning and interpretations of probability and uncertainty.

Bayesian vs Frequentist

Before we can get into the nuts and bolts of Uncertainty Quantification for Machine Learning, we have to take a few steps back. In fact, we have to to go back to the very beginning of things - the foundations of probability theory and statistics. Here, we find two fundamentally different perspectives and schools of thought, namely the Frequentists and the Bayesians. Now, I'm not personally a partisan. I believe in using the right tool for the job. However, the differences between Frequentist and Bayesian approaches run deeper than simply using different formulae, and each school of thought comes with its own set of axioms and assumptions. To truly appreciate this perceived schism in statistical sciences, we have to briefly visit the underlying epistemological differences.

Objective vs Subjective Uncertainty

The main difference boils down to the different views of Frequentists and Bayesians on the nature of uncertainty. Broadly, probability and uncertainty can be viewed as either objective or subjective:

Objective Uncertainty: Quantifies the uncertainty due to randomness of events (aleatoric uncertainty). More specifically, probability is interpreted as the limit of the relative frequency of many trials. From this perspective, we assume a perfect model, but we do not know the next outcome. An example of objective uncertainty would be the outcome of a perfect dice. A perfect dice has no imerfections, yet due to the inherent randomness of rolling it, we are uncertain about its next outcome. This is the Frequentist perspective.
Subjective Uncertainty: Quantifies a degree of belief. At its most extreme, this type of uncertainty is not tied to any physical process at all, and we are quantifying our subjective belief. An example would be using probability to describe the guiltiness of an accusee in a trial. This is the Bayesian perspective.
Everything in between. Here, we have some imperfect knowldge of the system we are trying to quantify the uncertainty of. An example would be a imperfect dice. We are not certain that all the sides have the same probability, and the goal is to attempt to quantify that uncertainty and make predictions based on our observations. In this case, both Frequentist and Bayesian methods are broadly used, but each method comes with its own set of assumptions and its unique ways of processing information.

For a more in-depth review of the differences between Frequentist and Bayesian statistics, I highly recommend Kristin Lennox's talk All About That Bayes. Also check out this brilliant XKCD comic.

Kolmogorov Axioms

The reason I bring up the Kolmogorov axioms is not to bore you with with a load of (very fundamental) mathematics. It's because I want to make the point that while providing rigour to the foundations of probability, they also provide freedom to your probabilistic models. That is to say, as long as the probability functions we use conform to these simple rules, we can use the entire machinery of probability theory to make inference.

Consider a probability space $(\Omega, F, P)$ , where $\Omega$ is the sample space, that is the space of all possible outcomes, $F$ is the event space, that is the space of all relevant sets of possible outcomes and $P$ is the probability function, which measures the probability of an even $E \in F$ . Then Kolmogorov's axiom are:

The probability of an event is a positive, real number: $P(E) \in \mathbb{R}, P(E) \geq 0 \quad \forall E \in F$ .
The probability that at least one event from the sample sapce will occur is 1: $P(\Omega) = 1$ .
The probability of the union of disjoint (non-overlapping) events is equal to the sum of their individual probabilities: $P(\cup_{i=1}^{\infty}E_i) = \sum_{i=1}^{\infty}P(E_i)$

Moreover, these axiom has various consequences that we are not going into detail with in this lesson, but you can look them up in any textbook about probability.