Monte Carlo simulation for project risk assessment¶

This notebook is an element of the free risk-engineering.org courseware. It can be distributed under the terms of the Creative Commons Attribution-ShareAlike licence.

Author: Eric Marsden eric.marsden@risk-engineering.org

This notebook contains an introduction to use of Python and the NumPy library for Monte Carlo simulation applied to a simple project risk problem. See the associated course materials for an introduction to the use of stochastic simulation methods and to download this content as a Jupyter/Python notebook. Please note that this content is pedagogical in nature rather than intended for practical use, but some commercial project management tools implement a similar technique.

Project management is notorious for providing incorrect (mostly overly confident) predictions on project costs and durations. Indeed, things rarely go according to plan in a complex project, and deviations from original estimates cause projects not to meet their delivery dates or the budgeted cost.

Project risk assessment consists of providing answers to questions such as the following:

what is the probability of completing the project within the planned budget and delivering before the due date?
What are realistic budgets and delivery dates?
What are the key risks that threaten the project’s objectives?

These questions are very important financially and logistically for the companies doing the work, as well as for their clients. To answer these questions, we collect information on estimated completion times and budgets for the individual tasks comprising the project, and their interdependencies. It is said that experienced project managers double the estimated task completion times provided to them then add two months, and still end up underestimating completion time. Experience shows that people provide better estimates if they are asked to provide a range of values for completion time and budget (a best case, worst case and most likely outcome).

Problem statement¶

A construction project involves three tasks:

Task 1 is likely to take three days (70% probability), but it might also be completed in two days (with 10% probability) or four days (20% probability)
Task 2 has a 60% probability of taking six days to finish, a 20% probability each of being completed in five days or eight days
Task 3 has an 80% probability of being completed in four days, 5% probability of being completed in three days and a 15% probability of being completed in five days.

Each task is dependent on the task before it, meaning that the three tasks must be executed in sequence. Your task is to provide information to the project manager concerning the expected completion time of the project and possible delays.

Basic approach using best and worst case¶

A basic approach to project risk would calculate best case and worst case project completion time. Our example is very simple so it’s easy to make this estimate by hand. We illustrate how to use interval arithmetic to resolve this kind of problem in more complex scenarios. We use the mpmath Python library for arbitrary precision floating point arithmetic, which provides support for interval arithmetic. You may need to install this library, for instance with

pip install mpmath

or if you’re using a Google CoLaboratory notebook, execute a code cell containing

!pip install mpmath

Because the tasks must be completed in sequence, the total duration of the project is simply the sum of the individual task durations.

In [1]:

from mpmath import iv

task1 = iv.mpf([2, 4])
task2 = iv.mpf([5, 8])
task3 = iv.mpf([3, 5])
task1 + task2 + task3

Out[1]:

mpi('10.0', '17.0')

So the best case completion time is 10 days, and the worst case is 17 days. That’s a very big range of uncertainty in our project risk assessment!

Monte Carlo stochastic simulation¶

Using a Monte Carlo stochastic simulation method, we will estimate the probability distribution of completion time, providing much more information for decision-making on project risk than only best and worst cases.

We start by defining a function that simulates the completion time of task 1. The functions use pseudorandom numbers generated from a uniform distribution.

In [2]:

import numpy
import matplotlib.pyplot as plt
plt.style.use("bmh")
%config InlineBackend.figure_formats=["svg"]

def task1_days() -> int:
    u = numpy.random.uniform()
    if u < 0.7: return 3
    if u < 0.8: return 2
    return 4

Let’s check that the function returns a plausible value if called once:

In [3]:

task1_days()

Out[3]:

Then check that the distribution of a large number of calls looks plausible:

In [4]:

N = 1000
sim = numpy.zeros(N, dtype=int)
for i in range(N):
    sim[i] = task1_days()
plt.stem(numpy.bincount(sim))
plt.title("Expected duration of task 1")
plt.xlabel("Days")
plt.margins(0.1)

We check that this result is plausible: the most likely duration is three days, and a duration of four days roughly two times more likely than a duration of 2 days. OK, this fits our task description.

We then define similar functions for the duration of tasks 2 and 3.

In [5]:

def task2_days() -> int:
    u = numpy.random.uniform()
    if u < 0.6: return 6
    if u < 0.8: return 5
    return 8

def task3_days() -> int:
    u = numpy.random.uniform()
    if u < 0.8: return 4
    if u < 0.85: return 3
    return 5

We assume that each task is dependent on the task before it, meaning that the three tasks must be executed in sequence. The total duration of the project is simply the sum of the individual task durations.

Note that in real applications, some tasks will most likely be able to be executed in parallel. Furthermore, the number of tasks will be much greater, and can reach several hundred. The method shown here will still work, though a graphical user interface to describe tasks will be very helpful to users.

In [6]:

def project_duration() -> int:
    return task1_days() + task2_days() + task3_days()

We can now run a large number of simulations of project execution, and from these estimate the worst case, best case and median durations. We can also provide the expected probability distribution of the duration.

In [7]:

N = 10_000
sim = numpy.zeros(N, dtype=int)
for i in range(N):
    sim[i] = project_duration()
plt.stem(numpy.bincount(sim))
plt.title("Expected project duration")
plt.xlabel("Days");
print("Worst case: {} days".format(sim.max()))
print("Best case: {} days".format(sim.min()))
print("Median: {} days".format(numpy.median(sim)))

Worst case: 17 days
Best case: 10 days
Median: 13.0 days

In [ ]:

We can also provide other quantile measures of estimated project duration. For example, to determine the duration that we are 95% confident will not be exceeded, we calculate the 95th percentile.

In [8]:

numpy.percentile(sim, 95)

Out[8]:

16.0

Note that the function numpy.percentile calculates an empirical quantile, and not an analytical quantile (based on the mathematical equation that defines the underlying probability distribution) as we have done in other exercises.

Parallel tasks¶

Suppose that tasks 2 and 3 run in parallel, instead of sequentially (one after the other). Their contribution to total project duration is the maximum of the two durations, rather than the sum as previously. We could adjust the code that calculates total project duration as follows:

In [9]:

def project_duration() -> int:
    return task1_days() + max(task2_days(), task3_days())

Correlated risks¶

In this simple example, we have assumed that the risks affecting the duration of each task are independent (the random numbers drawn in the functions for task1_days, task2_days and task3_days were drawn independently). In real projects, some of the risks affecting each task will be correlated with other tasks, for example due to adverse weather that affects each task. A simple way of modelling this form of correlated risks is to modify the project_duration function to add an additional possibility for delay due to weather, as shown below.

In [10]:

def project_duration() -> int:
    weather_delay = 0
    if numpy.random.uniform() < 0.1:
        weather_delay = 2
    return task1_days() + task2_days() + task3_days() + weather_delay

N = 10_000
sim = numpy.zeros(N, dtype=int)
for i in range(N):
    sim[i] = project_duration()
plt.stem(numpy.bincount(sim))
plt.title("Expected project duration")
plt.xlabel("Days");

Historical note¶

The first major use of stochastic simulation techniques for project risk management was the PERT method, developed by the US Navy Special Projects Office in the 1950s to help manage the development of Polaris nuclear submarines. The method incorporated uncertainty in project schedule estimates based on inputs provided by subproject leaders. Rather than ask engineers about the variance of their duration estimates (not a very natural feature of human estimates…), the PERT specialists would ask engineers to provide optimistic, most likely and pessimistic estimates of duration, and would fit a beta probability distribution to these parameters.

In [ ]:

By Eric Marsden