Home Course Concepts About

Simple exercises in reliability engineering

This notebook is an element of the free risk-engineering.org courseware. It can be distributed under the terms of the Creative Commons Attribution-ShareAlike licence.

Author: Eric Marsden eric.marsden@risk-engineering.org.


This notebook contains a number of simple exercises in reliability engineering. For background, see the associated lecture notes at risk-engineering.org.

In [1]:
import numpy
import scipy.stats

Lifetime of light bulbs

The lifetime of a modern low-wattage electronic light bulb is known to be exponentially distributed with a mean of 8000 hours.

  • Q1. Find the proportion of bulbs that may be expected to fail before 7000 hours of use.

  • Q2 What is the lifetime that we have 95% confidence will be exceeded?

Answer. The time to failure of our light bulbs can be modelled by the distribution

In [2]:
dist = scipy.stats.expon(scale=8000)

Q1: The CDF gives us the probability that the lifetime is ≤ $t$.

In [3]:
dist.cdf(7000)
Out[3]:
0.5831379803214916

So about 58% of light bulbs will fail before they reach 7000 hours of operation.

Q2: We need the 0.05 quantile of the lifetime distribution, which is given by the ppf function in scipy.stats.

In [4]:
# the result is expressed in hours
dist.ppf(0.05)
Out[4]:
410.3463551004043

Failure of electronic components

A particular electronic device will only function correctly if two essential components both function correctly. The lifetime of the first component is known to be exponentially distributed with a mean of 5000 hours and the lifetime of the second component (whose failures can be assumed to be independent of those of the first component) is known to be exponentially distributed with a mean of 7000 hours.

Q. Find the proportion of devices that may be expected to fail before 6000 hours use.

A. The device will only be working after 6000 hours if both components are operating. The probability of the first component still working after 6000 hours is

In [5]:
pa = 1 - scipy.stats.expon(scale=5000).cdf(6000)
pa
Out[5]:
0.3011942119122022

and likewise for the second component

In [6]:
pb = 1 - scipy.stats.expon(scale=7000).cdf(6000)
pb
Out[6]:
0.42437284567695

The probability of both working is

In [7]:
pa * pb
Out[7]:
0.12781864481060756

so the proportion of devices that can be expected to fail before 6000 hours' use is around 87%.

Maintenance of a large computing facility

For a large computer installation, the maintenance logbook shows that over a period of a month there were 15 unscheduled maintenance actions or downtimes, and a total of 1200 minutes in emergency maintenance status. Based upon prior data on this equipment, the reliability engineer expects repair times to be exponentially distributed. A warranty contract between the computer company and the customer calls for a penalty payment for any downtime exceeding 100 minutes. Find the following:

  • The MTTR and repair rate

  • The probability that the warranty requirement is being met

  • The median time to repair

  • The time within which 95% of the maintenance actions can be completed

A. The MTTR (mean time to repair) is simply the mean of the observed repair times. We saw 15 repairs for a total of 1200 minutes of repair time, meaning that MTTR = 1200/15 = 80 minutes. The repair rate μ is the inverse of the MTTR, 1/80 = 0.0125. Our probability distribution for repair times is

In [8]:
dist = scipy.stats.expon(scale=80)

The probability of time to repair not exceeding 100 minutes is given by the cumulative distribution function of the repair time:

In [9]:
 dist.cdf(100)
Out[9]:
0.7134952031398099

The median time to repair is the 0.5 quantile

In [10]:
# this is in minutes
dist.ppf(0.5)
Out[10]:
55.451774444795625

The time within which 95% of the maintenance actions can be completed is

In [11]:
# this is in minutes
dist.ppf(0.95)
Out[11]:
239.6585818843192

Failure of pumps in an oil field

From field data in an oil field, the time to failure of a pump, $X$, is known to be normally distributed. The mean and standard deviation of the time to failure are estimated from historical data as 3200 and 600 hours, respectively.

  • What is the probability that a pump will fail after it has worked for 2000 hours?

  • If two pumps work in parallel, what is probability that the system will fail after it has worked for 2000 hours?

We want to assess $\Pr(X < 2000)$, which is $1 - \Pr(X ≥ 2000)$, or

In [12]:
1 - scipy.stats.norm(3200, 600).cdf(2000)
Out[12]:
0.9772498680518208

The probability of the system working for at least 2000 hours is 1 - that of both pumps failing before 2000 hours, which is 1 - 0.977², or 0.9994.

Let’s call $Y$ the random variable representing time to failure of the redundant pump system, and $X_1$ and $X_2$ the time to failure of pumps 1 and 2 respectively. We want to determine $\Pr(Y > 2000)$, which is $1 - \Pr(Y ≤ 2000)$. This is $1 - \Pr(X_1 ≤ 2000 ∧ X_2 ≤ 2000)$ (given the parallel configuration of the pumps, the system fails when both of the pumps fail).

Given that pump failure is independent, that’s $1 - \Pr(X_1 ≤ 2000) × \Pr(X_2 ≤ 2000)$. So it's

In [13]:
1 - scipy.stats.norm(3200, 600).cdf(2000)**2
Out[13]:
0.9994824314963404