Statistical modelling
Introduction to probabilistic and statistical modelling of risk
Overview
Risk analysis is sometimes based on the analysis of data concerning a hazardous event, such as the occurrence of an earthquake, or the exceedance of a threshold. This analysis is based on statistical modelling, most often with computer tools. When the risk analyst puts her data scientist hat on, she collects data (measurements, observations) from various sources and inputs them into the computer, obtains a general overview of the data and its distribution, and builds a statistical model which attempts to reproduce properties of the underlying phenomena. After checking that the statistical model is a good fit for the observations, she can generate various risk metrics and quantify the level of uncertainty in the predictions.
Statistical modelling (or “data science”, to use a related and more trendy term) is an important part of risk analysis and safety in various engineering areas (mechanical engineering, nuclear engineering), in the management of natural hazards, in quality control, and in finance.
This submodule is a part of the risk analysis module.
Learning objectives
Upon completion of this submodule, you should be able to:

Analyze data using descriptive statistics and graphical tools

Fit a probability distribution to data (estimate distribution parameters)

Express various risk measures as statistical tests

Determine quantile measures of various risk metrics

Build flexible models to allow estimation of quantities of interest and associated uncertainty measures

Select appropriate distributions of random variables/vectors for random phenomena
Course material
Statistics and risk modelling with Python 

Python notebook on basic statistics


Python notebook on coins and dice


Python notebook on probability distributions


Brief reminder on statistics 

Interactive examples of probability distributions 
Analyzing data with Python 

Python notebook on simple descriptive statistics


Python notebook on analysis and curve fitting for weather data


Python notebook on analysis of speed of light measurements


Python notebook on analysis of earthquake data


Python notebook on Semmelweis’s work on risk reduction in hospitals

In these course materials, applications are presented using the NumPy and SciPy libraries for the Python programming language.
Other resources
We recommend the following sources of further information on this topic:

CMU Open Learning Initiative course Probability and Statistics, a free and open (course materials can be followed at any time) course

Course materials for the MIT Introduction to Probability and Statistics course, which is free and open (course materials can be followed at any time)

EdX course Introduction to probability (MITx) – note that the course can only be taken at specific periods during the year

Udacity course Introduction to statistics (no prerequisites, but note that the course can only be taken at specific periods during the year)

Harvard Extension School online lecture on Sets, Counting and Probability

Textbook Introduction to Probability by C. Grinstead and J. L. Snell, freely available under GNU Free Documentation Licence

Textbook Statistical inference for everyone, freely available under a Creative Commons licence

Computational statistics in Python, an online textbook with many examples

Python for econometrics (University of Cambridge)

Exploratory computing with Python, a set of Python notebooks on analyzing data using NumPy/SciPy

Book: Statistical modeling: a fresh approach by Daniel Kaplan

The Probability and statistics cookbook, by Matthias Vallentin

The NIST Engineering Statistics Handbook, an online compendium of information on statistics useful for engineering analysis

Seeing theory, a visual introduction to basic concepts in probability and statistics