{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"# Basic statistics"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"\n",
"\n",
"This notebook is an element of the free [risk-engineering.org courseware](https://risk-engineering.org/). It can be distributed under the terms of the [Creative Commons Attribution-ShareAlike licence](https://creativecommons.org/licenses/by-sa/4.0/). \n",
"\n",
"Author: Eric Marsden . \n",
"\n",
"---\n",
"\n",
"This notebook contains an introduction to use of Python and the NumPy library for basic statistical calculations.\n",
"See the [associated course materials](https://risk-engineering.org/statistical-modelling/) for background information and to download this content as a Jupyter notebook.\n",
"\n",
"We start by importing the numpy library, which makes it possible to use functions and variables from the library, prefixed by `numpy`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"import numpy"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"We can use Python as simple interactive calculator:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"9"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"2 + 3 + 4"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Here we call the `sqrt` function from the numpy library."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"2.0"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numpy.sqrt(2 + 2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Some useful constants are predefined."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"3.141592653589793"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numpy.pi"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"1.2246467991473532e-16"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numpy.sin(numpy.pi)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"The notation `e-16` above means $10^{-16}$; the number above is very very small (it’s a numerical approximation to the mathematical answer of zero)."
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"We can generate a random number from a uniform distribution between 20 and 30. If you evaluate this several times (in most Jupyter interfaces, press `Shift-Enter` or press on the `Run` button in the toolbar above), it will generate a different random number each time."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"25.1437515786662"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numpy.random.uniform(20, 30)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"25.04101521213037"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numpy.random.uniform(20, 30)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"We can generate an **array** of random numbers by passing a third argument to the `numpy.random.uniform` function, saying how many random numbers we want. We store the array in a *variable* named `obs`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([25.371427 , 25.20324491, 23.10450399, 22.72975024, 26.2111469 ,\n",
" 26.67308058, 28.1365947 , 29.42437473, 23.9345215 , 28.74785977])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"obs = numpy.random.uniform(20, 30, 10)\n",
"obs"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"The builtin function `len` in Python tells us the length of an array or a list."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"10"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(obs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"We can do arithmetic on arrays, adding them together or subtracting a constant from each element."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([50.74285399, 50.40648983, 46.20900799, 45.45950048, 52.4222938 ,\n",
" 53.34616115, 56.27318939, 58.84874946, 47.86904299, 57.49571954])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"obs + obs"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.371427 , 0.20324491, -1.89549601, -2.27024976, 1.2111469 ,\n",
" 1.67308058, 3.1365947 , 4.42437473, -1.0654785 , 3.74785977])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"obs - 25"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can apply a numpy function to all the elements of an array. "
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([5.03700576, 5.02028335, 4.80671447, 4.76757278, 5.1196823 ,\n",
" 5.16459878, 5.3043939 , 5.42442391, 4.89229205, 5.36170307])"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numpy.sqrt(obs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"The array has *methods*, a kind of function that acts on the array."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"25.953650431503206"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"obs.mean()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"259.53650431503206"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"obs.sum()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"22.72975024207144"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"obs.min()"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"There are similar functions in the `numpy` library that take an array as argument:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"25.953650431503206"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numpy.mean(obs)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"259.53650431503206"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numpy.sum(obs)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"22.72975024207144"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numpy.min(obs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"## Simple plotting"
]
},
{
"cell_type": "markdown",
"metadata": {
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"The matplotlib library allows you to generate many types of plots and statistical graphs in a convenient way. The [online gallery](https://matplotlib.org/gallery.html) shows the variety of plots available, and the [documentation](https://matplotlib.org/contents.html) is also available online. We import the `pyplot` component of matplotlib and give it an alias `plt`. "
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"plt.style.use(\"bmh\") # this affects the style (colors etc.) of plots\n",
"%config InlineBackend.figure_formats=[\"svg\"]"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"autoscroll": false,
"ein.hycell": false,
"ein.tags": "worksheet-0",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
"