# Analyzing earthquake data¶

This notebook is an element of the risk-engineering.org courseware. It can be distributed under the terms of the Creative Commons Attribution-ShareAlike licence.

Author: Eric Marsden eric.marsden@risk-engineering.org.

This notebook contains an introduction to the use of Python and SciPy to analyze data concerning earthquakes.

```
import numpy
import scipy.stats
import pandas
import seaborn as sns
import matplotlib.pyplot as plt
```

The USGS provides data on earthquakes that occurred throughout the world at https://earthquake.usgs.gov/earthquakes/search/. For convenience, we have extracted data for 2017 earthquakes of magnitude larger than 5 in CSV format.

```
data = pandas.read_csv("https://risk-engineering.org/static/data/earthquakes-2017.csv")
data.head()
```

```
len(data)
```

Let's clean up the data a little.

```
data.time = pandas.to_datetime(data.time)
data.sort_values('time', inplace=True)
```

Plot the data to see whether we can identify any visible patterns.

```
fig = plt.figure()
fig.set_size_inches(20, 4)
plt.plot(data.time, data.mag, '.');
plt.ylabel("Magnitude");
```

There's no visible trend in this time series.

Let's examine the distribution of earthquake magnitudes (recall that we only have data for earthquakes of magnitude larger than 5).

```
data.mag.hist(density=True, alpha=0.5, bins=30)
plt.xlabel("Earthquake magnitude")
plt.title("Histogram of earthquake magnitudes");
```

Earthquake arrivals follow a Poisson process. This means that interarrival times follow an exponential distribution.

```
duration = data.time.max() - data.time.min()
density = len(data) / float(duration.days)
density # events per day
```

We can attempt to fit an exponential distribution to the interarrival times.

```
# calculate the time delta between successive rows and convert into days
interarrival = data.time.diff().dropna().apply(lambda x: x / numpy.timedelta64(1, 'D'))
support = numpy.linspace(interarrival.min(), interarrival.max(), 100)
interarrival.hist(density=True, alpha=0.5, bins=30)
plt.plot(support, scipy.stats.expon(scale=1/density).pdf(support), lw=2)
plt.title("Duration between 5+ earthquakes", weight='bold')
plt.xlabel("Days");
```

It looks like the inter-event times do indeed fit an exponential distribution well. To check, generate a quantile-quantile plot against the exponential distribution.

```
shape, loc = scipy.stats.expon.fit(interarrival)
scipy.stats.probplot(interarrival, dist='expon', sparams=(shape, loc), plot=plt.figure().add_subplot(111))
plt.title("Exponential QQ-plot of interarrival time");
```

A correlogram or autocorrelation plot tests whether elements of a time series are positively correlated, negatively correlated, or independent of each other. This is important to detect trends or cycles in time series data.

```
plt.acorr(interarrival, maxlags=200)
plt.title("Autocorrelation of earthquake timeseries data");
```

There is no visible trend in this time series data (note that we are only examining large earthquakes; the same plot concerning all earthquakes would likely show correlations at small numbers of days, because large earthquakes tend to trigger small aftershocks in the following days, or sometimes large quakes are preceded by small earthquakes).

```
```