Home Course Concepts About

Uncertainty propagation

This notebook is an element of the free risk-engineering.org courseware. It can be distributed under the terms of the Creative Commons Attribution-ShareAlike licence.

Author: Eric Marsden eric.marsden@risk-engineering.org


This notebook contains an introduction to use of Python and NumPy for uncertainty propagation. It compares the use of the Python uncertainties library with a stochastic simulation. See the associated course materials for background information and to download this content as a Jupyter/Python notebook.

The body mass index (BMI) is the ratio $\frac{\text{body mass } (kg)}{\text{body height} (m)^2}$. It is often used as an (imperfect) indicator or obesity or malnutrition. Task: calculate your BMI and the associated uncertainty interval, assuming that:

  • your weight scale tells you that you weigh 84 kg (precision shown to the nearest kilogram)

  • a tape measure says you are between 181 and 182 cm tall (most likely value is 181.5 cm)

Solution using the uncertainties library

The uncertainties library implements linear error propagation theory in Python. The uncertainties library can be installed (if you’re using a Linux machine) using a command such as

sudo pip3 install uncertainties

If you’re using Google CoLaboratory, execute a code cell that contains

!pip install uncertainties

On a Microsoft Windows installation with Anaconda, open the "Anaconda Powershell Prompt" from the start menu, and execute

pip install uncertainties

In [14]:
import uncertainties

x = uncertainties.ufloat(10, 0.5)
x
Out[14]:
10.0+/-0.5

This defines an uncertain number (a random variable) with a central value of 10 and a standard deviation of 0.5. The standard deviation represents the uncertainty (which in this context you can think of as the error), in our number. The uncertainties package is able to do various types of arithmetic and other mathematical operations on these uncertain numbers, and propagates the uncertainty to the result. For example, we can multiply this uncertain number by two:

In [15]:
2 * x
Out[15]:
20.0+/-1.0

Note that the uncertainty range concerning our value has been multiplied by two, along with the value itself.

We can add the uncertain number to itself:

In [16]:
x + x + x
Out[16]:
30.0+/-1.5

We can also add a very uncertain number to x:

In [17]:
y = uncertainties.ufloat(1, 10)
x + y
Out[17]:
11.0+/-10.012492197250394

or we can add a certain number (a number with zero uncertainty) to x:

In [18]:
x + 5
Out[18]:
15.0+/-0.5

Note that the uncertainties package is able to determine that the error range in the expression below is zero (this is because the subtracted quantities are correlated to the first value).

In [19]:
2 * x - x - x
Out[19]:
0.0+/-0

The uncertainties package is also able to propagate uncertainty through more complicated mathematical expressions:

In [20]:
import uncertainties.umath

x**2 + 45*uncertainties.umath.sin(x + y)
Out[20]:
55.000440705218345+/-10.294066614145805

You can use it as a convenient calculator for mathematical expressions including uncertainty. The library is also useful to add uncertainty propagation to existing Python code, which you can often reuse without any modifications.

Coming back to our initial objective, recall that we want to calculate the body mass index (BMI) of a person, using the formula given above, from uncertain measurements of the person’s mass and height. The description of measurements given in the introduction indicates that:

  • weight is a uniform probability distribution between 83.5 and 84.5 kg

  • height is a triangular probability distribution between 1.81 and 1.82 m with a central value of 1.815.

To create appropriate random variables using the uncertainties package, we need to know their central value and standard deviation. We can calculate the standard deviations using scipy.stats.

In [21]:
import scipy.stats
# this is for the weight random variable (a uniform distribution)
weight_stdev = scipy.stats.uniform(83.5, 1).std()
weight_stdev
Out[21]:
0.28867513459481287
In [22]:
# this is for the height random variable (a triangular distribution)
height_stdev = scipy.stats.triang(loc=1.81, scale=0.01, c=0.5).std()
height_stdev
Out[22]:
0.0020412414523193153
In [23]:
weight = uncertainties.ufloat(84, weight_stdev)
height = uncertainties.ufloat(1.815, height_stdev)
BMI = weight/(height**2)
print("{:P}".format(BMI))
25.50±0.10

Solution using a Monte Carlo simulation

We can also estimate the body mass index using a Monte Carlo (stochastic simulation) method. Our slides on Monte Carlo methods for risk analysis present some background material on this method.

In [24]:
import numpy
from numpy.random import *
import matplotlib.pyplot as plt
plt.style.use("bmh")
%config InlineBackend.figure_formats=["svg"]

When undertaking statistical analyses, it can be important for your results to be reproducible, meaning that you obtain the same results each time you run the analysis. The pseudorandom number generator used by NumPy generates long sequences of outputs that are fully determined by the seed of the generator (if you use the same seed, you will obtain the same results each time you run the notebook). In default setting, the seed is taken from some unpredictable feature of the computer environment (such as the /dev/random device on a Linux/Unix computer), so sucessive runs will generate different results. If you want a reproducible notebook, you can set the random seed explicitly.

In [25]:
numpy.random.seed(42)
In [26]:
# N is the number of runs in our stochastic simulation
N = 10_000

def BMI() -> float:
    return uniform(83.5, 84.5) / triangular(1.81, 1.815, 1.82)**2

sim = numpy.zeros(N)
for i in range(N):
    sim[i] = BMI()
print("{} ± {}".format(sim.mean(), sim.std()))
plt.hist(sim, alpha=0.5);
25.49874886353346 ± 0.10490839707332848
No description has been provided for this image

Indeed, the result from the stochastic simulation is very close to that obtained using the uncertainties package.

Limitations

Note: the uncertainties package uses linear error propagation, and is only appropriate when the level of uncertainty is small (it ignores any non-linear terms in the derivatives of each operation in your calculation, which are used to propagate the uncertainties analytically). It also assumes that the uncertainty (or error) in your data follows a normal distribution, which is a good assumption when dealing with measurement errors for example, but is not appropriate in all cases. In particular, it won’t be suitable if your uncertainty is asymmetric, or if you need to truncate the uncertainty in some way (for example if you know that your measurements are definitely positive). The documentation provides more information on the underlying assumptions.

Two related but more complex libraries are soerp which implements second-order error propagation and mcerp which implements Monte Carlo error propagation.

In [ ]: