Home Course Concepts About

Probability distributions

This notebook is an element of the risk-engineering.org courseware. It can be distributed under the terms of the Creative Commons Attribution-ShareAlike licence.

Author: Eric Marsden eric.marsden@risk-engineering.org


This notebook contains an introduction to the use of SciPy library's support for various probability distributions. The library documentation is available online. We also show how the SymPy library for symbolic mathematics can be used to calculate various statistical properties analytically. Visit the associated course materials for background material and to download this content as a Python notebook.

In [1]:
import numpy
import scipy.stats
import matplotlib.pyplot as plt
import warnings
plt.style.use("bmh")
%config InlineBackend.figure_formats=["svg"]
warnings.filterwarnings("ignore", category=DeprecationWarning)

Uniform distribution

Let’s generate 5 random variates from a continuous uniform distribution between 90 and 100 (the first argument to the scipy.stats.uniform function is the lower bound, and the second argument is the width). The object u will contain the “frozen distribution”.

In [2]:
u = scipy.stats.uniform(90, 10)
u.rvs(5)
Out[2]:
array([97.17090436, 95.67121398, 96.96545119, 96.85219269, 92.65271103])

Let’s check that the expected value of the distribution is around 95.

In [3]:
u.rvs(1000).mean()
Out[3]:
94.88274983113557

Let’s check that around 20% of the variates are less than 92.

In [4]:
(u.rvs(1000) < 92).sum() / 1000.0
Out[4]:
0.196

We can also use the stats module of the SymPy library to obtain the same information using an analytical (rather than stochastic) method.

In [5]:
import sympy.stats

u = sympy.Symbol("u")
u = sympy.stats.Uniform(u, 90, 100)
# generate one random variate. The sample() function returns a Python iterator.
next(sympy.stats.sample(u))
/usr/lib/python3/dist-packages/sympy/stats/rv.py:1104: UserWarning: 
The return type of sample has been changed to return an iterator
object since version 1.7. For more information see
https://github.com/sympy/sympy/issues/19061
  warnings.warn(filldedent(message))
Out[5]:
90.84842481979973

Check that the expected value (the mean of the distribution) is 95.

In [6]:
sympy.stats.E(u)
Out[6]:
$\displaystyle 95$

The probability of a random variate being less than 92:

In [7]:
sympy.stats.P(u < 92)
Out[7]:
$\displaystyle \frac{1}{5}$

Gaussian distribution

Consider a Gaussian (normal) distribution centered in 5, with a standard deviation of 1.

In [8]:
norm = scipy.stats.norm(5, 1)
x = numpy.linspace(1, 9, 100)
plt.plot(x, norm.pdf(x));
2021-10-25T11:20:17.469583 image/svg+xml Matplotlib v3.3.4, https://matplotlib.org/
In [9]:
norm.mean()
Out[9]:
5.0

Check that half the distribution is located to the left of 5.

In [10]:
norm.cdf(5)
Out[10]:
0.5
In [11]:
norm.ppf(0.5)
Out[11]:
5.0

Find the first percentile of the distribution (the value of $x$ which has 1% of realizations to the left). Check that it is also equal to the 99% survival quantile.

In [12]:
norm.ppf(0.01)
Out[12]:
2.6736521259591592
In [13]:
norm.isf(0.99)
Out[13]:
2.6736521259591592
In [14]:
norm.cdf(norm.isf(0.99))
Out[14]:
0.01

We can also execute some calculations on the normal distribution analytically, using the SymPy library. This provides computer algebra capabilities similar to (though currently less powerful than) the commercial tools Mathematica and Maple.

In [15]:
z = sympy.Symbol("z")
Norm = sympy.stats.Normal(z, 5, 1)
sympy.stats.density(Norm)(z)
Out[15]:
$\displaystyle \frac{\sqrt{2} e^{- \frac{\left(z - 5\right)^{2}}{2}}}{2 \sqrt{\pi}}$
In [16]:
# we ask for the evaluation to 40 significant figures
sympy.stats.density(Norm)(4).evalf(40)
Out[16]:
$\displaystyle 0.2419707245191433497978301929355606548287$
In [17]:
norm.pdf(4)
Out[17]:
0.24197072451914337
In [18]:
# check that half of our distribution is greater than 5 (this is equivalent to the CDF(5))
sympy.stats.P(Norm > 5)
Out[18]:
$\displaystyle \frac{1}{2}$
In [19]:
# check that the expected value (the mean) is 5
sympy.stats.E(Norm)
Out[19]:
$\displaystyle 5$
In [20]:
# check that the standard deviation is 1
sympy.stats.std(Norm)
Out[20]:
$\displaystyle 1$
In [21]:
# generate one random variate from the distribution
# The function sample() returns a Python iterator
next(sympy.stats.sample(Norm))
/usr/lib/python3/dist-packages/sympy/stats/rv.py:1104: UserWarning: 
The return type of sample has been changed to return an iterator
object since version 1.7. For more information see
https://github.com/sympy/sympy/issues/19061
  warnings.warn(filldedent(message))
Out[21]:
6.712175135404825
In [22]:
# calculate the first percentile (the 0.01 quantile) -- this requires Sympy version 1.5
sympy.stats.quantile(Norm)(0.01).evalf(30)
Out[22]:
$\displaystyle 2.67365212595915925989391330587$
In [23]:
# compare with the value calculated by numpy
norm.ppf(0.01)
Out[23]:
2.6736521259591592

Central limit theorem

The central limit theorem states that the mean of a set of random measurements will tend to a normal distribution, no matter the shape of the original measurement distribution. The property is also true of the sum of a set of random measurements. Let's test that in Python, simulating measurements from a uniform distribution between 30 and 40.

Procedure: take 100 measurements from the $U(30, 40)$ distribution, and calculate their mean. Repeat this 10000 times and plot a histogram of the means, which should be normally distributed.

In [24]:
N = 10_000
sim = numpy.zeros(N)
for i in range(N):
    sim[i] = numpy.random.uniform(30, 40, 100).mean()
plt.hist(sim, bins=20, alpha=0.5, density=True);
2021-10-25T11:20:18.017083 image/svg+xml Matplotlib v3.3.4, https://matplotlib.org/

Exercise: try this with other probability distributions.

Exponential distribution

The exponential distribution is often used in reliability engineering to represent failure of equipment which is not exposed to wear. The hazard function, or failure rate, of the exponential distribution is constant, equal to $\lambda$. Let's check the property that the expected value (or mean) of an exponential random variable is $\frac{1}{\lambda}$.

In [25]:
lbda = 25
obs = scipy.stats.expon(scale=1/float(lbda)).rvs(size=1000)
# the mean of 1000 random variates from the exponential distribution 
obs.mean()
Out[25]:
0.03870032281361351
In [26]:
1/float(lbda)
Out[26]:
0.04

Indeed, those are quite close! Let’s check another property of the exponential distribution: that the variance is equal to $\lambda^{-2}$.

In [27]:
obs.var()
Out[27]:
0.0016837489379937845
In [28]:
1/float(lbda)**2
Out[28]:
0.0016

And of course since the standard deviation is the square root of the variance, it should be equal to the expected value.

In [29]:
obs.std()
Out[29]:
0.04103350993997204

As previously, we can also check these properties analytically, using the SymPy symbolic mathematics library, starting with the expected value:

In [30]:
exp = sympy.Symbol("exp")
exp = sympy.stats.Exponential(exp, lbda)
sympy.stats.E(exp)
Out[30]:
$\displaystyle \frac{1}{25}$
In [31]:
# now check that the variance is lambda ^ -2
sympy.stats.variance(exp) - lbda**-2
Out[31]:
$\displaystyle 0$
In [32]:
# check that the standard deviation is equal to the expectation
sympy.stats.std(exp) - sympy.stats.E(exp)
Out[32]:
$\displaystyle 0$
In [ ]: