{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Probability distributions"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"\n",
"\n",
"This notebook is an element of the [risk-engineering.org courseware](https://risk-engineering.org/). It can be distributed under the terms of the [Creative Commons Attribution-ShareAlike licence](https://creativecommons.org/licenses/by-sa/4.0/). \n",
" \n",
"Author: Eric Marsden \n",
"\n",
"---\n",
"\n",
"This notebook contains an introduction to the use of [SciPy](https://www.scipy.org/) library's support for various probability distributions. The [library documentation](https://docs.scipy.org/doc/scipy-0.14.0/reference/stats.html) is available online. We also show how the [SymPy library](https://sympy.org/) for symbolic mathematics can be used to calculate various statistical properties analytically. Visit the [associated course materials](https://risk-engineering.org/statistical-modelling/) for background material and to download this content as a Python notebook."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"import numpy\n",
"import scipy.stats\n",
"import matplotlib.pyplot as plt\n",
"plt.style.use(\"bmh\")\n",
"%config InlineBackend.figure_formats=[\"svg\"]"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"## Uniform distribution"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Let’s generate 5 random variates from a continuous uniform distribution between 90 and 100 (the first argument to the `scipy.stats.uniform` function is the lower bound, and the second argument is the width). The object `u` will contain the “frozen distribution”."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([91.85688457, 90.87813946, 95.49755923, 94.21517598, 95.77316387])"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"u = scipy.stats.uniform(90, 10)\n",
"u.rvs(5)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Let’s check that the expected value of the distribution is around 95."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"94.96971154289815"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"u.rvs(1000).mean()"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Let’s check that around 20% of the variates are less than 92."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"0.227"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(u.rvs(1000) < 92).sum() / 1000.0"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"We can also use the `stats` module of the [SymPy](https://sympy.org/) library to obtain the same information using an analytical (rather than stochastic) method."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/latex": [
"$\\displaystyle 90.3783207568647$"
],
"text/plain": [
"90.3783207568647"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import sympy.stats\n",
"\n",
"u = sympy.Symbol(\"u\")\n",
"u = sympy.stats.Uniform(u, 90, 100)\n",
"# generate one random variate\n",
"sympy.stats.sample(u)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Check that the expected value (the mean of the distribution) is 95."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/latex": [
"$\\displaystyle 95$"
],
"text/plain": [
"95"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sympy.stats.E(u)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"The probability of a random variate being less than 92:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/latex": [
"$\\displaystyle \\frac{1}{5}$"
],
"text/plain": [
"1/5"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sympy.stats.P(u < 92)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"## Gaussian distribution"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Consider a Gaussian (normal) distribution centered in 5, with a standard deviation of 1."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"