Analyzing earthquake data¶
This notebook is an element of the risk-engineering.org courseware. It can be distributed under the terms of the Creative Commons Attribution-ShareAlike licence.
Author: Eric Marsden eric.marsden@risk-engineering.org.
This notebook contains an introduction to the use of Python and SciPy to analyze data concerning earthquakes. See the associated course materials for background information and to download this content as a Jupyter notebook.
import numpy
import scipy.stats
import pandas
import matplotlib.pyplot as plt
plt.style.use("bmh")
The USGS provides data on earthquakes that occurred throughout the world at https://earthquake.usgs.gov/earthquakes/search/. For convenience, we have extracted data for 2017 earthquakes of magnitude larger than 5 in CSV format.
data = pandas.read_csv("https://risk-engineering.org/static/data/earthquakes-2017.csv")
data.head()
len(data)
Let's clean up the data a little.
data.time = pandas.to_datetime(data.time)
data.sort_values("time", inplace=True)
Plot the data to see whether we can identify any visible patterns.
fig = plt.figure()
fig.set_size_inches(20, 4)
plt.plot(data.time, data.mag, ".");
plt.ylabel("Magnitude");
There's no visible trend in this time series.
Let's examine the distribution of earthquake magnitudes (recall that we only have data for earthquakes of magnitude larger than 5).
data.mag.hist(density=True, alpha=0.5, bins=30)
plt.xlabel("Earthquake magnitude")
plt.title("Histogram of earthquake magnitudes");
Earthquake arrivals follow a Poisson process. This means that interarrival times follow an exponential distribution.
duration = data.time.max() - data.time.min()
density = len(data) / float(duration.days)
density # events per day
We can attempt to fit an exponential distribution to the interarrival times.
# calculate the time delta between successive rows and convert into days
interarrival = data.time.diff().dropna().apply(lambda x: x / numpy.timedelta64(1, "D"))
support = numpy.linspace(interarrival.min(), interarrival.max(), 100)
interarrival.hist(density=True, alpha=0.5, bins=30)
plt.plot(support, scipy.stats.expon(scale=1/density).pdf(support), lw=2)
plt.title("Duration between 5+ earthquakes")
plt.xlabel("Days");
It looks like the inter-event times do indeed fit an exponential distribution well. To check, generate a probability plot against the exponential distribution. This shows quantiles from our sample against quantiles from the theoretical distribution. If the points are close to the diagonal red line, the sample closely follows the distribution.
shape, loc = scipy.stats.expon.fit(interarrival)
scipy.stats.probplot(interarrival,
dist="expon", sparams=(shape, loc),
plot=plt.figure().add_subplot(111))
plt.title("Exponential probability plot of interarrival time");
A correlogram or autocorrelation plot tests whether elements of a time series are positively correlated, negatively correlated, or independent of each other. This is important to detect trends or cycles in time series data.
plt.acorr(interarrival, maxlags=200)
plt.title("Autocorrelation of earthquake timeseries data");
There is no visible trend in this time series data (note that we are only examining large earthquakes; the same plot concerning all earthquakes would likely show correlations at small numbers of days, because large earthquakes tend to trigger small aftershocks in the following days, or sometimes large quakes are preceded by small earthquakes).