Earthquakes

STAT 141B Exploratory Data Analysis Project

Project Earthquake

Group Members: Karthika Pai, Kathryn Chiang, An Qi Ma, Natalie Marcom

For our final STA 141B exploratory data science project, we have decided to focus on earthquakes. Though all four of us use significant datasets and analyze them in different ways, the crux of our datasets are from the USGS Earthquake Database. With the skills we have learned from this class - most specifically, csv file reading; using libraries such as Basemap, Pandas, Numpy, Matplotlib and basic statistics, we hope to answer several questions we have about earthquakes. Each section will be preceded by the question it tries to answer, in bold.

Each group member was in charge of one section:

  1. Part 1 - Karthika Pai
  2. Part 2 - Kathryn Chiang
  3. Part 3 - An Qi Ma
  4. Part 4 - Natalie Marcom
import warnings
warnings.filterwarnings('ignore')

#import statements
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')

import os

from mpl_toolkits.basemap import Basemap
%matplotlib inline

Let's look at our dataset!

The dataset is a csv file that has been downloaded from the USGS Earthquake Database (shown above). This dataset represents significant earthquakes that have occured throughout the world from the years 1965 to 2016. A significant earthquake is one that has been determined by the USGS to meet the three following criteria:

  1. mag_significance = magnitude * 100 * (magnitude / 6.5);
  2. pager_significance = (red) ? 2000 : (orange) ? 1000 : (yellow) ? 500 : 0; (PAGER is a USGS-internal measure)
  3. dyfi_significance = min(num_responses, 1000) * max_cdi / 10; (Did you feel it - also known as dyfi - is a query that takes into account whether people perceived the earthquake or not. The higher the magnitude of the earthquake, the greater the dyfi significance)

significance = max(mag_significance, pager_significance) + dyfi_significance

Any event with a significance > 600 is considered a significant event and appears on the list.

directory = os.path.join(".", "world_eq.csv") 
eq = pd.read_csv(directory)
eq.head() #23412 total
total = len(eq)
eq.dtypes
Date                           object
Time                           object
Latitude                      float64
Longitude                     float64
Type                           object
Depth                         float64
Depth Error                   float64
Depth Seismic Stations        float64
Magnitude                     float64
Magnitude Type                 object
Magnitude Error               float64
Magnitude Seismic Stations    float64
Azimuthal Gap                 float64
Horizontal Distance           float64
Horizontal Error              float64
Root Mean Square              float64
ID                             object
Source                         object
Location Source                object
Magnitude Source               object
Status                         object
dtype: object

This is certainly a large dataset! The file has records of over 23000 earthquakes (23000+ rows), the majority of whose magnitude is over 4.0 (precise statistics will be discussed later). It also has 21 features, such as latitude and longitude, the magnitude, depth and other features such as azimuthal gap and USGS-specific earthquake ID.

We don't need some features for our analysis, so let's include only the time, the data, the latitude and longitude, and the depth of the earthquake source to make our analysis simpler!

simple = eq[["Date", "Time", "Latitude","Longitude","Magnitude", "Depth"]]
simple.head()
Date Time Latitude Longitude Magnitude Depth
0 01/02/1965 13:44:18 19.246 145.616 6.0 131.6
1 01/04/1965 11:29:49 1.863 127.352 5.8 80.0
2 01/05/1965 18:05:58 -20.579 -173.972 6.2 20.0
3 01/08/1965 18:49:43 -59.076 -23.557 5.8 15.0
4 01/09/1965 13:32:50 11.938 126.427 5.8 15.0

What is a rough geographical distribution of our earthquake list? Are some areas more "cluttered" or concentrated than others?

m = Basemap(projection="mill")
x,y = m([longs for longs in simple["Longitude"]],
         [lats for lats in simple["Latitude"]])
fig = plt.figure(figsize=(20,20))
plt.title("Significant Earthquakes from 1965 - 2016")
m.scatter(x,y, s = 10, c = "maroon")
m.drawcoastlines()
m.drawmapboundary()
m.drawcountries()
m.fillcontinents(color='lightsteelblue',lake_color='skyblue')

plt.show()

png

It seems like earthquakes are distributed around naturally occuring fault lines in the earth's tectonic plates. Let's make the dots appear a bit larger in order to figure out which regions have the most concentration.

fig = plt.figure(figsize=(20,20))
plt.title("Significant Earthquakes from 1965 - 2016")
m.scatter(x,y, s = 100, c = "maroon")
m.drawcoastlines()
m.drawmapboundary()
m.drawcountries()
m.fillcontinents(color='lightsteelblue',lake_color='skyblue')

plt.show()

png

It seems that majority of earthquakes are concentrated in the Indonesian, Sino Pacific and the Japanese area. Why is this so? Before we look at magnitude of earthquakes and how it relates to the geographical distribution of significant earthqauakes, let's try to answer this question. According to National Geographic, the Pacific Ring of Fire, technically called the Circum-Pacific belt, is the world's greatest earthquake belt, according to the U.S. Geological Survey (USGS), due to its series of fault lines stretching 25,000 miles (40,000 kilometers) from Chile in the Western Hemisphere through Japan and Southeast Asia. The magazine states that

  1. Roughly 90 percent of all the world's earthquakes, and 80 percent of the world's largest earthquakes, strike along the Ring of Fire
  2. About 17 percent of the world's largest earthquakes and 5-6 percent of all quakes occur along the Alpide belt.

Are these statistics true? Let's find out!

I have the defined the Ring of Fire matrix to be the area of the world whose latitude is below 59.389 and above -45.783 and whose longitude is greater than -229.219 and below -65.391 degrees, (converted to about -70 to 120 on the Mercator projection). These values are obtained by drawing a rectangle that circumscribed the Ring of Fire area on the USGS interactive map.

rof_lat = [-61.270, 56.632]
rof_long = [-70, 120]
ringoffire = simple[((simple.Latitude < rof_lat[1]) & 
                    (simple.Latitude > rof_lat[0]) & 
                     ~((simple.Longitude < rof_long[1]) & 
                       (simple.Longitude > rof_long[0])))]
x,y = m([longs for longs in ringoffire["Longitude"]],
         [lats for lats in ringoffire["Latitude"]])
fig2 = plt.figure(figsize=(20,20))
plt.title("Earthquakes in the Ring of Fire Area")
m.scatter(x,y, s = 15, c = "maroon")
m.drawcoastlines()
m.drawmapboundary()
m.drawcountries()
m.fillcontinents(color='lightsteelblue',lake_color='skyblue')

plt.show()

png

ringoffire
Date Time Latitude Longitude Magnitude Depth
0 01/02/1965 13:44:18 19.2460 145.6160 6.0 131.60
1 01/04/1965 11:29:49 1.8630 127.3520 5.8 80.00
2 01/05/1965 18:05:58 -20.5790 -173.9720 6.2 20.00
4 01/09/1965 13:32:50 11.9380 126.4270 5.8 15.00
5 01/10/1965 13:36:32 -13.4050 166.6290 6.7 35.00
7 01/15/1965 23:17:42 -13.3090 166.2120 6.0 35.00
9 01/17/1965 10:43:17 -24.5630 178.4870 5.8 565.00
11 01/24/1965 00:11:17 -2.6080 125.9520 8.2 20.00
12 01/29/1965 09:35:30 54.6360 161.7030 5.5 55.00
13 02/01/1965 05:27:06 -18.6970 -177.8640 5.6 482.90
15 02/04/1965 03:25:00 -51.8400 139.7410 6.1 10.00
16 02/04/1965 05:01:22 51.2510 178.7150 8.7 30.30
17 02/04/1965 06:04:59 51.6390 175.0550 6.0 30.00
18 02/04/1965 06:37:06 52.5280 172.0070 5.7 25.00
19 02/04/1965 06:39:32 51.6260 175.7460 5.8 25.00
20 02/04/1965 07:11:23 51.0370 177.8480 5.9 25.00
21 02/04/1965 07:14:59 51.7300 173.9750 5.9 20.00
22 02/04/1965 07:23:12 51.7750 173.0580 5.7 10.00
23 02/04/1965 07:43:43 52.6110 172.5880 5.7 24.00
24 02/04/1965 08:06:17 51.8310 174.3680 5.7 31.80
25 02/04/1965 08:33:41 51.9480 173.9690 5.6 20.00
26 02/04/1965 08:40:44 51.4430 179.6050 7.3 30.00
27 02/04/1965 12:06:08 52.7730 171.9740 6.5 30.00
28 02/04/1965 12:50:59 51.7720 174.6960 5.6 20.00
29 02/04/1965 14:18:29 52.9750 171.0910 6.4 25.00
30 02/04/1965 15:51:25 52.9900 170.8740 5.8 25.00
31 02/04/1965 18:34:12 51.5360 175.0450 5.8 25.00
33 02/04/1965 22:30:03 51.8120 174.2060 5.7 10.00
34 02/05/1965 06:39:50 51.7620 174.8410 5.7 25.00
35 02/05/1965 09:32:11 52.4380 174.3210 6.3 39.50
... ... ... ... ... ... ...
23379 12/10/2016 02:45:40 -10.8829 161.2789 5.8 7.66
23380 12/10/2016 16:24:35 -5.6593 154.4734 6.0 142.58
23381 12/11/2016 14:33:13 -9.1237 -109.8492 5.8 10.00
23382 12/11/2016 17:26:10 -10.9640 161.5723 5.5 10.00
23383 12/14/2016 02:01:23 21.2897 144.4037 6.0 22.37
23384 12/14/2016 21:14:56 21.3697 144.2175 5.5 10.00
23385 12/16/2016 11:34:58 14.0882 -90.8691 5.5 71.26
23386 12/17/2016 10:51:10 -4.5049 153.5216 7.9 94.54
23387 12/17/2016 11:22:40 -4.4244 153.5419 5.6 83.36
23388 12/17/2016 11:27:39 -5.6497 153.9975 6.3 26.50
23389 12/18/2016 05:46:25 -10.2137 161.2177 5.9 37.39
23390 12/18/2016 06:15:46 -34.9886 -107.8694 5.5 10.00
23391 12/18/2016 06:39:42 -6.3046 154.3530 5.9 10.00
23392 12/18/2016 09:47:05 8.3489 137.6672 6.2 12.43
23393 12/18/2016 11:35:48 -10.1904 161.2187 5.5 57.52
23394 12/18/2016 13:30:11 -9.9640 -70.9714 6.4 622.54
23395 12/20/2016 04:21:29 -10.1773 161.2236 6.4 16.65
23397 12/20/2016 12:33:14 -10.1785 160.9149 6.0 10.00
23398 12/20/2016 20:07:53 -10.1549 160.7816 5.5 10.38
23399 12/21/2016 00:17:15 -7.5082 127.9206 6.7 152.00
23400 12/21/2016 16:43:57 21.5036 145.4172 5.9 12.05
23401 12/24/2016 01:32:16 -5.2453 153.5754 6.0 35.00
23402 12/24/2016 03:58:55 -5.1460 153.5166 5.8 30.00
23403 12/25/2016 14:22:27 -43.4029 -73.9395 7.6 38.00
23404 12/25/2016 14:32:13 -43.4810 -74.4771 5.6 14.93
23406 12/28/2016 08:18:01 38.3754 -118.8977 5.6 10.80
23407 12/28/2016 08:22:12 38.3917 -118.8941 5.6 12.30
23408 12/28/2016 09:13:47 38.3777 -118.8957 5.5 8.80
23409 12/28/2016 12:38:51 36.9179 140.4262 5.9 10.00
23411 12/30/2016 20:08:28 37.3973 141.4103 5.5 11.94

17596 rows × 6 columns

There are 17596 earthquakes which are positioned solely in the ring of fire area. There were 23412 total large earthquakes in the entire dataset. So, frequency wise, about 75.1% of significant or largest earthquakes are in the Ring of Fire region. This is extremely close to the 80% figure cited in the National Geographic.

Magnitude Statistics

What are some basic statistics (max, min, average etc) for the magnitudes of the entire dataset and the Ring of Fire earthquake subset?

Which magnitudes occur the most frequently in both datasets?

Is there some sort of pattern in the frequency of magnitudes?

minimum = simple["Magnitude"].min()
maximum = simple["Magnitude"].max()
average = simple["Magnitude"].mean()

print("Minimum:", minimum)
print("Maximum:",maximum)
print("Mean",average)
('Minimum:', 5.5)
('Maximum:', 9.0999999999999996)
('Mean', 5.882530753460003)
minimum = ringoffire["Magnitude"].min()
maximum = ringoffire["Magnitude"].max()
average = ringoffire["Magnitude"].mean()

print("Minimum:", minimum)
print("Maximum:",maximum)
print("Mean",average)
('Minimum:', 5.5)
('Maximum:', 9.0999999999999996)
('Mean', 5.887151057058525)

The minimum, maximum and average for both datasets are eerily close together! What does that mean? For one thing, the subset data (the Ring of Fire earthquakes) comprise almost 75% of the total data; this ensures that statistics for both datasets will be extremely similar. Secondly, and more importantly, the dataset contains only earthquakes that have more than 5.0 magnitude (significant ones). If the dataset included a list of all earthquakes, we would see that a concentration of the world's major earthquakes would be in the Ring of Fire area. We will do so later.

In the meantime, let's continue to look at some simple statistics and correlations with magnitude.

n, bins, patch = plt.hist(simple["Magnitude"], histtype = 'step', range=(5.5,9.5), bins = 10)
plt.xlabel("Earthquake Magnitudes")
plt.ylabel("Frequency")
plt.title("Frequency by Magnitude")
histo = pd.DataFrame()
for i in range(0, len(n)):
    mag = str(bins[i])+ "-"+str(bins[i+1])
    freq = n[i]
    percentage = round((n[i]/total) * 100, 4)
    histo = histo.append(pd.Series([mag, freq, percentage]), ignore_index=True)
    
histo.columns = ['Range of Magnitude', 'Frequency', 'Percentage']
histo
Range of Magnitude Frequency Percentage
0 5.5-5.9 14109.0 60.2640
1 5.9-6.3 5655.0 24.1543
2 6.3-6.7 2173.0 9.2816
3 6.7-7.1 905.0 3.8655
4 7.1-7.5 347.0 1.4821
5 7.5-7.9 162.0 0.6920
6 7.9-8.3 48.0 0.2050
7 8.3-8.7 9.0 0.0384
8 8.7-9.1 2.0 0.0085
9 9.1-9.5 2.0 0.0085

png

It seems that 60% of significant earthquakes had a magnitude between 5.5 to 5.86, whereas less that 4% total scored between 7.0 and 9.1 on the Richter scale.

An interesting patterns also occurs when we plot magnitudes vs frequency on a log scale.

fig, ax = plt.subplots()
#ax.plot(histo.index, fit[0] * histo.index + fit[1], color='red')
ax.scatter(histo.index, histo['Frequency'])
plt.xticks(histo.index, bins, rotation='vertical')
plt.yscale('log', nonposy='clip')

plt.xlabel("Magnitude")
plt.ylabel("Frequency")
plt.title("Worldwide Earthquake Frequencies, Logarithmic Scale")
fig.show()

png

Now the earthquakes almost a straight line on the graph. This pattern is known as a power-law distribution: it turns out that for every increase of one point in magnitude, an earthquake becomes about ten times less frequent. So, for example, magnitude 6 earthquakes occur ten times more frequently than magnitude 7's, and one hundred times more often than magnitude 8's.

We can use this to relatively calculate the probability that an earthquake will hit a particular region, although it is impossible to know exactly when. For example, if we know that there were 15 earthquakes between 5.0 and 5.9 in a particular region in a period of 70 years, that works to about one earthquake in three years. Following this distribution above, we can "predict" that an earthquake measuring between 6.0 and 6.9 should occur about once every thirty years in this region.

Is there any correlation between depth of the earthquake and magnitude of the earthquake?

Earthquakes can occur anywhere between the Earth's surface and about 700 kilometers below the surface. For scientific purposes, an earthquake depth range of 0 - 700 km is divided into three zones: shallow, intermediate, and deep.

shallow = len(simple[simple.Depth < 70]) #18660
intermediate = len(simple[(simple.Depth > 70) & (simple.Depth < 300)]) ##3390
deep = len(simple[simple.Depth > 300]) #1326

print str(round(shallow/float(total) * 100, 4)) + " percent of signficant earthquakes are shallow."
print str(round(intermediate/float(total) * 100, 4)) + " percent of signficant earthquakes are intermediate."
print str(round(deep/float(total) * 100, 4)) + " percent of signficant earthquakes are deep."
79.7027 percent of signficant earthquakes are shallow.
14.4798 percent of signficant earthquakes are intermediate.
5.6638 percent of signficant earthquakes are deep.

This is very surprising! There was an assumption that deep earthquakes necessarily produce significant ones, but that is not true.

What about the geographical distribution of deep earthquakes? I predict that deep earthquakes are primarily situated in the Ring of Fire.

deep_df = simple[simple.Depth > 300]
x,y = m([longs for longs in deep_df["Longitude"]],
         [lats for lats in deep_df["Latitude"]])
fig = plt.figure(figsize=(20,20))
plt.title("Geographical Distribution of Deep Earthquakes")
m.scatter(x,y, s = 60, c = "maroon")
m.drawcoastlines()
m.drawmapboundary()
m.drawcountries()
m.fillcontinents(color='lightsteelblue',lake_color='skyblue')

plt.show()

png

Deep earthquakes are primarily situated in the Ring of Fire area, with the exception of a few near the Italian Penninsula.

plt.scatter(simple["Magnitude"],simple["Depth"])
plt.xlabel("Magnitude")
plt.ylabel("Depth (in meters)")
plt.title("Magnitude vs Depth")
plt.show()

png

This plot tells me that earthquakes with magnitudes 5.5 to roughly 6.5 can be found in a great range of depths, from 0 meters to 700 meters. However, the depth of larger earthquakes are bimodal - they originate from the surface or from deep underground.

Are they correlated at all? Doing a simple coefficient of correlations calculation says the answer is most likely no.

np.corrcoef(simple["Magnitude"], simple["Depth"])
array([[ 1.        ,  0.02345731],
       [ 0.02345731,  1.        ]])

Time correlations

Do some earthquakes occur more in some months than others?

Do some years have more earthquakes than others?

simple["Date"] = pd.to_datetime(simple["Date"])
simple["Month"] = simple['Date'].dt.month
simple["Year"] = simple['Date'].dt.year

freqbymonth = simple.groupby('Month').size()
freqbyyear = simple.groupby('Year').size()

fig, ax = plt.subplots(figsize = (20,10))
bar_positions = np.arange(12) + 0.5
months = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]

k = plt.bar(np.arange(len(months)), freqbymonth)
plt.xticks(np.arange(len(months)), months)

plt.xlabel('Month')
plt.ylabel('Frequency')
plt.title('Earthquakes by Month')
 

def autolabel(rects):
    """
    Attach a text label above each bar displaying its height
    """
    for rect in rects:
        height = rect.get_height()
        ax.text(rect.get_x() + rect.get_width()/2., 1.05*height,
                '%d' % int(height),
                ha='center', va='bottom')
        
autolabel(k)
plt.show()

png

It seems that there is a uniform distribution of earthquake frequency along all 12 months.

Let's look at year.

yearly_line = plt.plot([i for i in range(1965, 2017)], freqbyyear, color = 'steelblue')
plt.xlabel('Year')
plt.ylabel('Frequency')
plt.title('Frequencies of Signficant Earthquakes by Year 1965 - 2016')
<matplotlib.text.Text at 0x11b1f04d0>

png

Earthquakes in the USA

Part 2 By Kathryn Chiang

import matplotlib.pyplot as plt
import matplotlib.cm
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
from __future__ import division
from collections import Counter
from nltk.probability import FreqDist
from mpl_toolkits.basemap import Basemap
from matplotlib.patches import Polygon

Those are some functions I use for this dataset. We will call the functions later in the project.

def title_time(df):
    """Input: dataframe
    Output: start time and end time of earthquake"""
    title = '%s through %s' % (str(df['time'][df['index']==min(df['index'])]).split()[1],str(df['time'][df['index']==max(df['index'])]).split()[1])
    return title

def get_marker_color(magnitude):
    """Input: magnitude
    Output: green for small earthquakes (<2), yellow for moderate
    earthquakes(<4), and red for significant earthquakes(>4)."""
    if magnitude < 2.0:
        return ('go')
    elif magnitude < 4.0:
        return ('yo')
    else:
        return ('ro')

def get_stat(statename):
    """Inpute: state name, ex: 'California'
    Output: earthquake information of that state"""
    ca = pd.DataFrame()
    for i in range(len(us)):
        if us['state'][i] == statename:
            ca = ca.append(us.loc[i])
    ca = ca.reset_index(drop=True)
    return ca

def map_mag(df,lllon,lllat,urlon,urlat, place):
    my_map = Basemap(projection='merc', lat_0=57, lon_0=-135,
                     resolution = 'h', area_thresh = 1000.0,
                     llcrnrlon=lllon, llcrnrlat=lllat,
                     urcrnrlon=urlon, urcrnrlat=urlat)

    my_map.drawcoastlines()
    my_map.drawcountries()
    my_map.fillcontinents(color='coral')
    my_map.drawmapboundary()
    my_map.drawstates()

    lats = df['latitude']
    lons = df['longitude']
    magnitudes = df['mag']
    min_marker_size = 2.5
    for lon, lat, mag in zip(lons, lats, magnitudes):
        x,y = my_map(lon, lat)
        msize = mag * min_marker_size
        marker_string = get_marker_color(mag)
        my_map.plot(x, y, marker_string, markersize=msize)
    title = 'Earthquake Magnitude in %s\n' % place
    title += title_time(df)
    plt.title(title)
    plt.show()

def get_co_data(df):
    """Input: dataframe;
    Output: freq of magnitude strength"""
    colors = []
    for i in range(len(df)):
        if get_marker_color(df['mag'][i]) == 'go':
            m = 'small'
        elif get_marker_color(df['mag'][i]) == 'yo':
            m = 'moderate'
        else:
            m = 'significant'
        colors.append(m)
    x = range(len(colors))
    f = Counter(colors)
    return f

def ratio(df):
    """Input: dataframe
    Output: ratio percentage for each magnitude strength"""
    co = get_co_data(df)
    ratio = list()
    for i in range(len(co)):
        r = co.values()[i]/sum(co.values())*100
        ratio.append(r)
    return ratio

states = ['Alabama','Alaska','Arizona','Arkansas','California','Colorado','Connecticut','Delaware','Florida','Georgia','Hawaii','Idaho', 'Illinois','Indiana','Iowa','Kansas','Kentucky','Louisiana','Maine' 'Maryland','Massachusetts','Michigan','Minnesota','Mississippi', 'Missouri','Montana','Nebraska','Nevada','New Hampshire','New Jersey','New Mexico','New York','North Carolina','North Dakota','Ohio','Oklahoma','Oregon','Pennsylvania','Rhode Island','South  Carolina','South Dakota','Tennessee','Texas','Utah','Vermont','Virginia','Washington','West Virginia','Wisconsin','Wyoming']

Let's look at our dataset!

The dataset is a csv file that has been downloaded from the USGS Earthquake Database. This dataset represents all the earthquakes that have occurred throughout the world from the month January to Febuary, 2017. This dataset includes 7660 earthquakes, but we will only focus on the data that located in the USA.

data_w = pd.read_csv('all_month.csv')
print len(data_w)
data_w.dtypes
7660





time                object
latitude           float64
longitude          float64
depth              float64
mag                float64
magType             object
nst                float64
gap                float64
dmin               float64
rms                float64
net                 object
id                  object
updated             object
place               object
type                object
horizontalError    float64
depthError         float64
magError           float64
magNst             float64
status              object
locationSource      object
magSource           object
dtype: object
def create_us_file():
    state = ['Alabama','Alaska','Arizona','Arkansas','California','Colorado','Connecticut','Delaware','Florida','Georgia','Hawaii','Idaho', 'Illinois','Indiana','Iowa','Kansas','Kentucky','Louisiana','Maine' 'Maryland','Massachusetts','Michigan','Minnesota','Mississippi', 'Missouri','Montana','Nebraska','Nevada','New Hampshire','New Jersey','New Mexico','New York','North Carolina','North Dakota','Ohio','Oklahoma','Oregon','Pennsylvania','Rhode Island','South  Carolina','South Dakota','Tennessee','Texas','Utah','Vermont','Virginia','Washington','West Virginia','Wisconsin','Wyoming',"AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DC", "DE", "FL", "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"]
    states_d = {'AK': 'Alaska','AL': 'Alabama','AR': 'Arkansas','AS': 'American Samoa','AZ': 'Arizona','CA': 'California','CO': 'Colorado','CT': 'Connecticut','DC': 'District of Columbia','DE': 'Delaware','FL': 'Florida','GA': 'Georgia','GU': 'Guam','HI': 'Hawaii','IA': 'Iowa','ID': 'Idaho','IL': 'Illinois','IN': 'Indiana','KS': 'Kansas','KY': 'Kentucky','LA': 'Louisiana','MA': 'Massachusetts','MD': 'Maryland','ME': 'Maine','MI': 'Michigan','MN': 'Minnesota','MO': 'Missouri','MP': 'Northern Mariana Islands','MS': 'Mississippi','MT': 'Montana','NA': 'National','NC': 'North Carolina','ND': 'North Dakota','NE': 'Nebraska','NH': 'New Hampshire','NJ': 'New Jersey','NM': 'New Mexico','NV': 'Nevada','NY': 'New York','OH': 'Ohio','OK': 'Oklahoma','OR': 'Oregon','PA': 'Pennsylvania','PR': 'Puerto Rico','RI': 'Rhode Island','SC': 'South Carolina','SD': 'South Dakota','TN': 'Tennessee','TX': 'Texas','UT': 'Utah','VA': 'Virginia','VI': 'Virgin Islands','VT': 'Vermont','WA': 'Washington','WI': 'Wisconsin','WV': 'West Virginia','WY': 'Wyoming'}
    usa = pd.DataFrame()
    for j in range(len(state)):
        for i in range(len(data_w)):
            if state[j] in data_w['place'][i].split(',')[-1]:
                data_w['state'] = state[j]
                usa = usa.append(data_w.loc[i])
    usa = usa.reset_index()
    for i in range(len(usa)):
        if len(usa['state'][i]) == 2:
            usa['state'][i] = states_d[usa['state'][i]]
    return usa
usa = create_us_file()
#usa.to_csv('us_earthquake.csv',mode = 'w',index = False)

By making this dataset more accessible for the project, I extracted data that are located in the USA and added a new column "state" to show the states for each row. We will basicly use the columns (shown below) for this project.

data = pd.read_csv('us_earthquake.csv')
us = data[["index","time", "latitude","longitude","mag","state","place","depth"]]
us.head()
index time latitude longitude mag state place depth
0 0 2017-02-16T19:41:22.795Z 63.8717 -150.3950 1.3 Alaska 70km W of Healy, Alaska 8.1
1 7 2017-02-16T17:11:22.122Z 62.6021 -149.8518 1.8 Alaska 33km NNE of Talkeetna, Alaska 68.9
2 14 2017-02-16T16:28:06.568Z 61.4375 -151.6854 1.9 Alaska 85km NNW of Nikiski, Alaska 83.8
3 19 2017-02-16T15:27:17.594Z 61.7097 -149.6386 1.6 Alaska 9km NNW of Meadow Lakes, Alaska 30.5
4 20 2017-02-16T15:23:09.053Z 59.9683 -147.0029 2.0 Alaska 70km NNW of Middleton Island, Alaska 16.5

What is a rough geographical distribution of our earthquake list? Are some areas more "cluttered" or concentrated than others?

"""USA territories"""
map_mag(us, -172, 15, -65.25, 71,'USA')

png

It seems that majority of earthquakes are concentrated in the West Coast of the United States, South of Alaska, and some in Nevada and Hawaii area. Why is this so? In the previous part "Earthquakse in the World", we discussed that roughly 90 percent of all earthquakes strike along the Ring of Fire, so does United States. The globel map above shows the magnitude of each earthquakes occured in the USA. There are 3 different catagories distingished by colors green, yellow, and red. Green color dots are for small earthquakes (magnitude less than 2), yellow color dots are for moderate earthquakes (magnitude less than 4), and red color dots are for significant earthquakes (magnitude greater than 4).

Magnitude Statistics

What are some basic statistics (mas, min, average etc) for the magnitudes of the dataset?

Magnitude is a quantitative measure of the size of the earthquake at its source. The higher the magnitude the stronger the earthquake is. We divided the magnitude into 3 levels, small, moderate, and significant. Which magnitude strength appears the most often?

Which magnitude strength occur the most frequently in each states?

minimum = us["mag"].min()
maximum = us["mag"].max()
average = us["mag"].mean()

print("Minimum:", minimum)
print("Maximum:",maximum)
print("Mean",average)
('Minimum:', -0.91000000000000003)
('Maximum:', 5.2999999999999998)
('Mean', 1.2374354862224841)

The minimum magnitude in this dataset is -0.91. The maximum magnitude is 5.30. Average is 1.24 which means small earthquakes occurred the most frequently. Is this true for every states?

s = list()
m = list()
l = list()
for state in states:
    state_name = get_stat(state)
    co = get_co_data(state_name)
    vals = co['small'] #significant count value
    s.append(vals)
    valm = co['moderate'] 
    m.append(valm)
    vall = co['significant']
    l.append(vall)
sb = pd.DataFrame({'small':s, 'moderate':m, 'significant':l})
sb.plot(kind='bar', stacked=True, color = ['blue','yellow','red'])
plt.ylabel('Numbers of Earthquakes')
plt.title('Stacked Bar Plot for Magnitude Strength of United States')
plt.xticks(np.arange(len(states)), states, rotation = 90,size = 8)

plt.show()

png

This is a stacked bar plot for the magnitude strength. However, let's break down the y-axis to visualize the data more clearly.

s = list()
m = list()
l = list()
for state in states:
    state_name = get_stat(state)
    co = get_co_data(state_name)
    vals = co['small'] #significant count value
    s.append(vals)
    valm = co['moderate'] 
    m.append(valm)
    vall = co['significant']
    l.append(vall)
df = pd.DataFrame({'small':s, 'moderate':m, 'significant':l})
f, axis = plt.subplots(2, 1, sharex=True)
df.plot(kind='bar', ax=axis[0],stacked=True, color = ['blue','yellow','red'])
df.plot(kind='bar', ax=axis[1],stacked=True, color = ['blue','yellow','red'])
plt.xticks(np.arange(len(states)), states, rotation = 90,size = 8)

axis[0].set_ylim(450, 2480)
axis[1].set_ylim(0, 100)
axis[1].legend().set_visible(False)

axis[0].spines['bottom'].set_visible(False)
axis[1].spines['top'].set_visible(False)
axis[0].xaxis.tick_top()
axis[0].tick_params(labeltop='off')
axis[1].xaxis.tick_bottom()
d = .015
kwargs = dict(transform=axis[0].transAxes, color='k', clip_on=False)
axis[0].plot((-d,+d),(-d,+d), **kwargs)
axis[0].plot((1-d,1+d),(-d,+d), **kwargs)
kwargs.update(transform=axis[1].transAxes)
axis[1].plot((-d,+d),(1-d,1+d), **kwargs)
axis[1].plot((1-d,1+d),(1-d,1+d), **kwargs)
plt.show()

png

The stacked bar plot above shows the distribution of the magnitude strength for each states in the USA. We set the small earthquakes (red color) with magnitudes less than 2, moderate earthquakes (blue color) with magnitudes less than 4, and significant earthquakes (yellow color) with magnitudes greater than 4. If we look at magnitude strength by ratio, most of the states have the highest ratio on small earthquakes and the lowest ratio on significant earthquakes. However, Georgia has the highest ratio for significant earthquakes Oklahoma has the highest ratio for moderate earthquakes.

gr = get_stat('Georgia')
grv = get_co_data(gr)
print grv
print ratio(gr)
ok = get_stat('Oklahoma')
okv = get_co_data(ok)
print okv
print ratio(ok)
Counter({'significant': 11})
[100.0]
Counter({'moderate': 60, 'small': 3})
[4.761904761904762, 95.23809523809523]

The function (above) shows that Georgia has 100% significant earthquakes, and Oklahoma has 95% for the moderate earthquakes. However, the dataset for Georgia and Oklahoma are too small that if we want a further analysis, we will have to include more dataset for it to be unbiased.

What is the top 2 states in the USA that has the most earthquakes?

c = us['state'].value_counts()
x = c.values
y = c.index
l = np.arange(len(c))

fig,ax = plt.subplots()
rects = ax.patches
plt.bar(l, x, color = 'pink')
plt.xticks(l, y, size = 9,rotation = 90)
plt.ylabel('Numbers of Earthquakes')
plt.xlabel('State Names')
plt.title('Numbers of Earthquakes by States Names')
labels = x
for rect, label in zip(rects, labels):
    height = rect.get_height()
    ax.text(rect.get_x() + rect.get_width()/2, height + 5, label, ha='center', va='bottom',fontweight='bold',rotation = 45)
plt.show()

png

This is a histogram with numbers of earthquakes by state names. The numbers on top of each bar shows the exact amount earthquakes occured in that state. Most of the earthquakes occurs in Alaska and California, which has occured 2455 and 2441 times respectively during the month.

Alaska vs. California

What is a rough geographical distribution of the dataset for Alaska and California? Are some areas more "cluttered" or concentrated than others?

Is there any findings by comparing their magnitude strength?

How about comparing their magnitude strength ratio?

ak = get_stat('Alaska')
ca = get_stat('California')
"""Alaska"""
map_mag(ak, -172, 48, -126, 72, 'Alaska')
"""California"""
map_mag(ca, -125, 32, -114, 42, 'California')

png

png

The earthquakes in Alaska seems to be more concentrated than the earthquakes in California that are spread out along the coast and along the east side of California. How about their magnitude?

akv = get_co_data(ak).values()
cav = get_co_data(ca).values()
twobar = pd.DataFrame({'Alaska':akv, 'California':cav})
twobar.plot(kind='bar',color = ['yellow','cornflowerblue'])
ind = np.arange(3)
plt.xticks(range(3),get_co_data(ak).keys(),rotation = 0)
plt.ylabel('Numbers of Earthquakes')
plt.xlabel('Strength of Earthquakes')
plt.title('Bar Plots for California vs Alaska Magnitudes')
for a,b in zip(ind, akv): 
    plt.text(a, b, str(b),fontweight='bold',va='bottom',ha='right',color = 'darkgoldenrod')
for a,b in zip(ind, cav): 
    plt.text(a, b, str(b),fontweight='bold',va='bottom',ha='left',color = 'darkblue')

plt.show()

png

Comparing the barplots of magnitude strength for California and Alaska, we can see that the most earthquakes for both states are small earthquakes which has magnitude that are less than 2. However, Alaska has 379 less small earthquakes than the numbers of small earthquakes in California, but Alaska has 377 and 16 more moderate and significant earthquakes than the earthquakes in California. What does that mean? Let's look at their ratio.

rak = ratio(ak)
rca = ratio(ca)
ind = range(len(rak))
co = get_co_data(ak)
plt.plot(rak,'blue',label = 'Alaska')
plt.plot(rca,'red',label = 'California')
plt.ylabel('Ratio in Percentage (%)')
plt.xlabel('Earthquake Strength')
plt.title('Ratio of Alaska vs. California')
plt.xticks(range(3), co.keys())

for a,b in zip(ind, rak): 
    plt.text(a, b, str(round(b,1)) + '%',fontweight='bold',color = 'midnightblue')
for a,b in zip(ind, rca): 
    plt.text(a, b, str(round(b,1)) + '%',fontweight='bold',va='top', color = 'darkred')
plt.legend(loc = 'upper right')

plt.show()

png

I divided each values by its total numbers of earthquakes, and then times 100 to get the percentage. Around 22% more of the earthquakes in Alaska are moderate or significant earthquakes, comparing to California's earthquakes. Alaska tends to have more moderate and significant earthquakes than California.

Is there any correlation between depth of the earthquake and magnitude of the earthquake?

In the previous part, we discussed that earthquakes with magnitudes 5.5 to 6.5 does not have an relationship with depth of the earthquake. However, we want to test it with a different dataset which include magnitudes range from -0.91 to 5.30.

plt.scatter(us['mag'],us['depth'])
plt.ylabel('depth')
plt.xlabel('magnitude')
plt.title('Magnitude vs Depth Scattered Plot in the USA')
plt.show()

png

This plot tells me that earthquakes with magnitudes roughly 0 to 5.3 can be found in a range of depths, from 0 meters to around 270 meters. There seems to have correlation for the earthquakes with small magnitude. As the magnitude gets larger, depth range gets larger. Does that means there is a correlation between the two? We get a correlation coefficient of 0.33 which indicate a weak positive linear relationship, so there is a weak correlation between depth of the earthquake and magnitude of the earthquake.

df = us[['mag','depth']]
df.corr()
mag depth
mag 1.000000 0.327762
depth 0.327762 1.000000

Conclusion

The majority of earthquakes in the USA are concentrated in the West Coast of the United States, South of Alaska, and some in Nevada and Hawaii area. Most of the earthquakes in each states has the highest ratio on small earthquakes and lowest ratio on significant earthquakes. Alaska and California are the two states that have the most earthquakes. By comparing Alaska with California, we found that Alaska tends to have more moderate and significant earthquakes than California. By comparing depth of the earthquake to the magnitude of the earthquake, we found that there is a weak correlation between depth of the earthquake and magnitude of the earthquake.

Part 3: The Relationship Between Earthquakes and Tsunamis

In this part of the project, I took a dataset of significant earthquakes from 1965-2012. The tsunami dataset contains a list of observations of tsunamis that have occured throughout history and comes from the NOAA website that contains many types of information related to tsunamis. Big earthquakes are said to cause tsunamis so I will be analyzing how earthquakes and tsunamis are related what how big of earthquakes usually cause tsunamis.

import pandas as pd
import matplotlib
from matplotlib import pyplot as plt
import numpy as np
plt.style.use('ggplot')
# earthquakes dataframe
earthquakes = pd.read_csv('world_eq.csv')
earthquakes.head(10)
Date Time Latitude Longitude Type Depth Depth Error Depth Seismic Stations Magnitude Magnitude Type ... Magnitude Seismic Stations Azimuthal Gap Horizontal Distance Horizontal Error Root Mean Square ID Source Location Source Magnitude Source Status
0 1/2/1965 13:44:18 19.246 145.616 Earthquake 131.6 NaN NaN 6.0 MW ... NaN NaN NaN NaN NaN ISCGEM860706 ISCGEM ISCGEM ISCGEM Automatic
1 1/4/1965 11:29:49 1.863 127.352 Earthquake 80.0 NaN NaN 5.8 MW ... NaN NaN NaN NaN NaN ISCGEM860737 ISCGEM ISCGEM ISCGEM Automatic
2 1/5/1965 18:05:58 -20.579 -173.972 Earthquake 20.0 NaN NaN 6.2 MW ... NaN NaN NaN NaN NaN ISCGEM860762 ISCGEM ISCGEM ISCGEM Automatic
3 1/8/1965 18:49:43 -59.076 -23.557 Earthquake 15.0 NaN NaN 5.8 MW ... NaN NaN NaN NaN NaN ISCGEM860856 ISCGEM ISCGEM ISCGEM Automatic
4 1/9/1965 13:32:50 11.938 126.427 Earthquake 15.0 NaN NaN 5.8 MW ... NaN NaN NaN NaN NaN ISCGEM860890 ISCGEM ISCGEM ISCGEM Automatic
5 1/10/1965 13:36:32 -13.405 166.629 Earthquake 35.0 NaN NaN 6.7 MW ... NaN NaN NaN NaN NaN ISCGEM860922 ISCGEM ISCGEM ISCGEM Automatic
6 1/12/1965 13:32:25 27.357 87.867 Earthquake 20.0 NaN NaN 5.9 MW ... NaN NaN NaN NaN NaN ISCGEM861007 ISCGEM ISCGEM ISCGEM Automatic
7 1/15/1965 23:17:42 -13.309 166.212 Earthquake 35.0 NaN NaN 6.0 MW ... NaN NaN NaN NaN NaN ISCGEM861111 ISCGEM ISCGEM ISCGEM Automatic
8 1/16/1965 11:32:37 -56.452 -27.043 Earthquake 95.0 NaN NaN 6.0 MW ... NaN NaN NaN NaN NaN ISCGEMSUP861125 ISCGEMSUP ISCGEM ISCGEM Automatic
9 1/17/1965 10:43:17 -24.563 178.487 Earthquake 565.0 NaN NaN 5.8 MW ... NaN NaN NaN NaN NaN ISCGEM861148 ISCGEM ISCGEM ISCGEM Automatic

10 rows × 21 columns

len(earthquakes.index)
23412
earthquakes = earthquakes[["Date", "Time", "Latitude","Longitude","Magnitude", "Depth"]]
earthquakes.head()
Date Time Latitude Longitude Magnitude Depth
0 1/2/1965 13:44:18 19.246 145.616 6.0 131.6
1 1/4/1965 11:29:49 1.863 127.352 5.8 80.0
2 1/5/1965 18:05:58 -20.579 -173.972 6.2 20.0
3 1/8/1965 18:49:43 -59.076 -23.557 5.8 15.0
4 1/9/1965 13:32:50 11.938 126.427 5.8 15.0
# tsunamis dataframe
tsunamis = pd.read_excel('tsevent.xlsx')
tsunamis.head()
ID YEAR MONTH DAY HOUR MINUTE SECOND EVENT_VALIDITY CAUSE_CODE FOCAL_DEPTH ... TOTAL_MISSING TOTAL_MISSING_DESCRIPTION TOTAL_INJURIES TOTAL_INJURIES_DESCRIPTION TOTAL_DAMAGE_MILLIONS_DOLLARS TOTAL_DAMAGE_DESCRIPTION TOTAL_HOUSES_DESTROYED TOTAL_HOUSES_DESTROYED_DESCRIPTION TOTAL_HOUSES_DAMAGED TOTAL_HOUSES_DAMAGED_DESCRIPTION
0 1 -2000 NaN NaN NaN NaN NaN 1.0 1.0 NaN ... NaN NaN NaN NaN NaN 4.0 NaN NaN NaN NaN
1 3 -1610 NaN NaN NaN NaN NaN 4.0 6.0 NaN ... NaN NaN NaN NaN NaN 3.0 NaN NaN NaN NaN
2 4 -1365 NaN NaN NaN NaN NaN 1.0 1.0 NaN ... NaN NaN NaN NaN NaN 3.0 NaN NaN NaN NaN
3 5 -1300 NaN NaN NaN NaN NaN 2.0 0.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 6 -760 NaN NaN NaN NaN NaN 2.0 0.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 46 columns

from mpl_toolkits.basemap import Basemap
from matplotlib.colors import rgb2hex
from matplotlib.patches import Polygon
for i in range(0, len(tsunamis.columns.values)):
    tsunamis.columns.values[i] = str(tsunamis.columns.values[i])
# delete unnecessary columns
tsunamis.drop(tsunamis.columns[[range(16,46)]], inplace = True, axis = 1)
tsunamis = tsunamis[["ID", "YEAR", "MONTH", "DAY", "HOUR", "MINUTE", "COUNTRY", "STATE", "LOCATION_NAME", "LATITUDE", "LONGITUDE"]]
tsunamis.head()
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
0 1 -2000 NaN NaN NaN NaN SYRIA NaN SYRIAN COASTS 35.683 35.80
1 3 -1610 NaN NaN NaN NaN GREECE NaN THERA ISLAND (SANTORINI) 36.400 25.40
2 4 -1365 NaN NaN NaN NaN SYRIA NaN SYRIAN COASTS 35.683 35.80
3 5 -1300 NaN NaN NaN NaN TURKEY NaN IONIAN COASTS, TROAD 39.960 26.24
4 6 -760 NaN NaN NaN NaN ISRAEL NaN ISRAEL AND LEBANON COASTS NaN NaN

I felt that some of these variables in the tsunami datasets, with most of them being the number of destructions, injured, and damages were unnecessary in this part of the project so I deleted those variables from the dataset.

# Drop N/A lon/lat values for tsunami
# I filtered with longitude because if longitude has N/A, corresponding latitude also has it
tsu = tsunamis.loc[np.isnan(tsunamis['LONGITUDE']) == False]
tsu.head()
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
0 1 -2000 NaN NaN NaN NaN SYRIA NaN SYRIAN COASTS 35.683 35.80
1 3 -1610 NaN NaN NaN NaN GREECE NaN THERA ISLAND (SANTORINI) 36.400 25.40
2 4 -1365 NaN NaN NaN NaN SYRIA NaN SYRIAN COASTS 35.683 35.80
3 5 -1300 NaN NaN NaN NaN TURKEY NaN IONIAN COASTS, TROAD 39.960 26.24
5 7 -590 NaN NaN NaN NaN LEBANON NaN LEBANON COASTS 33.270 35.22

I noticed that my tsunami dataset had some N/A values for some observations so I dropped those observations or I would not have been able to plot those observations on a new map.

recenttsu = tsu.loc[tsunamis['YEAR'] > 1964]
recenttsu.head()
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2026 1963 1965 1.0 24.0 0.0 11.0 INDONESIA NaN SANANA ISLAND -2.400 126.100
2027 1964 1965 2.0 4.0 5.0 1.0 USA AK RAT ISLANDS, ALEUTIAN ISLANDS, AK 51.290 178.550
2028 5470 1965 2.0 19.0 NaN NaN CHILE NaN SOUTHERN CHILE -41.755 -72.396
2029 1965 1965 2.0 23.0 22.0 11.0 CHILE NaN NORTHERN CHILE -25.670 -70.630
2030 3042 1965 3.0 9.0 17.0 57.0 GREECE NaN AEGEAN SEA 39.400 24.000
len(recenttsu.index)
546

The dataset above is a list of tsunamis that happened from 1965-2017 which corresponds with the timeframe of the earthquakes dataset.

What is the geographical distribution of the earthquakes and tsunamis list and how much do they overlap?

# draw world map

plt.figure(figsize=(15,10))
displaymap = Basemap(llcrnrlon=-180,llcrnrlat=-90,urcrnrlon=180,urcrnrlat=90)
displaymap.drawmapboundary()
displaymap.drawcountries()
displaymap.drawcoastlines()
C:\Users\Apus\Anaconda2\lib\site-packages\mpl_toolkits\basemap\__init__.py:1623: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead.
  fill_color = ax.get_axis_bgcolor()





<matplotlib.collections.LineCollection at 0xc4725c0>
# Convert longitudes and latitudes to list of floats
longitude = earthquakes[['Longitude']].values.tolist()
for i in range(0, len(longitude)):
    longitude[i] = float(longitude[i][0])
latitude = earthquakes[['Latitude']].values.tolist()
for i in range(0, len(latitude)):
    latitude[i] = float(latitude[i][0])
tlongitude = recenttsu[[u'LONGITUDE']].values.tolist()
for i in range(0, len(tlongitude)):
    tlongitude[i] = float(tlongitude[i][0])
tlatitude = recenttsu[[u'LATITUDE']].values.tolist()
for i in range(0, len(tlatitude)):
    tlatitude[i] = float(tlatitude[i][0])
lons,lats = displaymap(longitude, latitude)
tlons, tlats = displaymap(tlongitude, tlatitude)
displaymap.plot(lons, lats, 'bo', color = "blue")
displaymap.plot(tlons, tlats, 'bo', color = "red")
C:\Users\Apus\Anaconda2\lib\site-packages\mpl_toolkits\basemap\__init__.py:3260: MatplotlibDeprecationWarning: The ishold function was deprecated in version 2.0.
  b = ax.ishold()
C:\Users\Apus\Anaconda2\lib\site-packages\mpl_toolkits\basemap\__init__.py:3269: MatplotlibDeprecationWarning: axes.hold is deprecated.
    See the API Changes document (http://matplotlib.org/api/api_changes.html)
    for more details.
  ax.hold(b)





[<matplotlib.lines.Line2D at 0xcbd1c50>]
plt.title("Earthquakes and Tsunamis around the World from `1965-2017")
plt.show()

png

First, I converted all the observations for longitude and latitude in both sets from strings to floats. Then I plotted a map and all the known points for the earthquakes dataset and all the known points for the tsunami datasets. It seems that a lot of the points both overlap somewhere in the North American region and in the East Asian region and in the area known as the Ring of Fire where a large number of earthquakes and volcanic activity occur. It also looks like more tsunamis have occured in the Europe region rather than earthquakes.

dates = earthquakes[['Date']].values.tolist()
years = []
months = []
days = []
for i in range(0, len(dates)):
    dates[i] = dates[i][0].split("/")
    try:
        years.append(dates[i][2])
    except IndexError:
        years.append('NaN')
    try:
        months.append(dates[i][0])
    except IndexError:
        months.append('NaN')
    try:
        days.append(dates[i][1])
    except IndexError:
        days.append('NaN')
idlist = []
for i in range(0, len(earthquakes.index)):
    idlist.append(i)
earthquakes['Year'] = years
earthquakes['Month'] = months
earthquakes['Days'] = days
earthquakes['ID'] = idlist
earthquakes.head()
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
0 1/2/1965 13:44:18 19.246 145.616 6.0 131.6 1965 1 2 0
1 1/4/1965 11:29:49 1.863 127.352 5.8 80.0 1965 1 4 1
2 1/5/1965 18:05:58 -20.579 -173.972 6.2 20.0 1965 1 5 2
3 1/8/1965 18:49:43 -59.076 -23.557 5.8 15.0 1965 1 8 3
4 1/9/1965 13:32:50 11.938 126.427 5.8 15.0 1965 1 9 4

I split the dates into days, months, and years and added those rows to the dataset so I can analyze the dataset more flexibly. I also added IDs to each observation in order to remember specific ones.

How often do earthquakes cause tsunamis? How much of the tsunamis in the dataset are caused by earthquakes?

I am interested in seeing how many earthquakes cause tsunamis in each year and their magnitude so I will pick two random years and analyze the earthquakes and tsunamis in those years.

float(len(recenttsu.index))/float(len(earthquakes.index))
0.023321373654536137

Earthquakes are sometimes said to cause tsunamis and based on this, about 2.3% of earthquakes cause tsunamis.

eq2012 = earthquakes.loc[(earthquakes['Year'] == '2012')]
tsu2012 = tsu.loc[tsu[u'YEAR'] == 2012]
tsu2012
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2515 5442 2012 2.0 2.0 13.0 34.0 VANUATU NaN VANUATU ISLANDS -17.827 167.133
2516 5446 2012 3.0 14.0 9.0 8.0 JAPAN NaN HOKKAIDO ISLAND 40.887 144.944
2517 5447 2012 3.0 20.0 18.0 2.0 MEXICO NaN S. MEXICO 16.493 -98.231
2518 5449 2012 4.0 11.0 8.0 38.0 INDONESIA NaN OFF W. COAST OF N SUMATRA 2.327 93.063
2519 5450 2012 4.0 11.0 10.0 43.0 INDONESIA NaN OFF W. COAST OF N SUMATRA 0.802 92.463
2520 5451 2012 4.0 14.0 22.0 5.0 VANUATU NaN VANUATU ISLANDS -18.972 168.741
2521 5460 2012 7.0 15.0 NaN NaN GREENLAND NaN ILULISSAT ICEFJORD 69.200 -51.300
2522 5462 2012 8.0 27.0 4.0 37.0 NICARAGUA NaN OFF THE COAST 12.139 -88.590
2523 5463 2012 8.0 31.0 12.0 47.0 PHILIPPINES NaN PHILIPPINE ISLANDS 10.811 126.638
2524 5464 2012 9.0 5.0 14.0 42.0 COSTA RICA NaN COSTA RICA 10.085 -85.315
2525 5467 2012 10.0 28.0 3.0 4.0 CANADA BC BRITISH COLUMBIA 52.788 -132.101
2526 5468 2012 11.0 7.0 16.0 35.0 GUATEMALA NaN GUATEMALA 13.988 -91.895
2527 5469 2012 12.0 7.0 8.0 18.0 JAPAN NaN OFF EAST COAST OF HONSHU ISLAND 37.890 143.949
2528 5471 2012 12.0 28.0 NaN NaN CHINA NaN ZHAOJUN BRIDGE, HUBEI PROVINCE 31.256 110.733
print len(tsu2012), len(eq2012)
14 445

In the year 2012, it looks like there is 1 tsunami that occured in February, 2 in March, 3 in April, 1 in July, 2 in August, 1 in September, 1 in October, 1 in Novemer, and 2 in December with a total of 14 tsunamis. There are 445 earthquakes total in the year 2012.

tsu2012.loc[tsu2012[u'MONTH'] == 2]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2515 5442 2012 2.0 2.0 13.0 34.0 VANUATU NaN VANUATU ISLANDS -17.827 167.133
eq2012.loc[(eq2012['Month'] == '2') & (eq2012['Days'] == '2')]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21142 2/2/2012 6:46:30 -6.563 149.774 5.6 51.3 2012 2 2 21142
21143 2/2/2012 9:32:17 -6.586 149.718 5.6 38.6 2012 2 2 21143
21144 2/2/2012 13:34:41 -17.827 167.133 7.1 23.0 2012 2 2 21144
21145 2/2/2012 17:27:07 -17.954 167.179 5.5 20.6 2012 2 2 21145

I will look at the time, longitude, and latitude of the observations in the earthquakes and if any matches the tsunami values, then it is assumed that that specific earthquake caused the tsunami. The earthquake observation that matches this tsunami observation is the third observation in the earthquakes that happened in February 2012.

earthquakes.loc[earthquakes['ID'] == 21144]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21144 2/2/2012 13:34:41 -17.827 167.133 7.1 23.0 2012 2 2 21144

Now I will do the same for March and the rest of the months

tsu2012.loc[tsu2012[u'MONTH'] == 3]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2516 5446 2012 3.0 14.0 9.0 8.0 JAPAN NaN HOKKAIDO ISLAND 40.887 144.944
2517 5447 2012 3.0 20.0 18.0 2.0 MEXICO NaN S. MEXICO 16.493 -98.231
eq2012.loc[(eq2012['Month'] == '3') & ((eq2012['Days'] == '14') | (eq2012['Days'] == '20'))]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21192 3/14/2012 9:08:35 40.887 144.944 6.9 12.0 2012 3 14 21192
21193 3/14/2012 10:49:25 40.781 144.761 6.1 10.0 2012 3 14 21193
21194 3/14/2012 10:57:40 40.755 144.806 5.6 12.0 2012 3 14 21194
21195 3/14/2012 12:05:05 35.687 140.695 6.0 10.0 2012 3 14 21195
21196 3/14/2012 21:13:08 -5.595 151.042 6.2 28.0 2012 3 14 21196
21202 3/20/2012 17:56:19 -3.812 140.266 6.1 66.0 2012 3 20 21202
21203 3/20/2012 18:02:47 16.493 -98.231 7.4 20.0 2012 3 20 21203
earthquakes.loc[(earthquakes['ID'] == 21192) | (earthquakes['ID'] == 21203)]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21192 3/14/2012 9:08:35 40.887 144.944 6.9 12.0 2012 3 14 21192
21203 3/20/2012 18:02:47 16.493 -98.231 7.4 20.0 2012 3 20 21203
tsu2012.loc[tsu2012[u'MONTH'] == 4]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2518 5449 2012 4.0 11.0 8.0 38.0 INDONESIA NaN OFF W. COAST OF N SUMATRA 2.327 93.063
2519 5450 2012 4.0 11.0 10.0 43.0 INDONESIA NaN OFF W. COAST OF N SUMATRA 0.802 92.463
2520 5451 2012 4.0 14.0 22.0 5.0 VANUATU NaN VANUATU ISLANDS -18.972 168.741
eq2012.loc[(eq2012['Month'] == '4') & ((eq2012['Days'] == '11') | (eq2012['Days'] == '14'))]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21219 4/11/2012 8:38:37 2.327 93.063 8.6 20.0 2012 4 11 21219
21220 4/11/2012 8:55:47 1.271 91.748 5.8 10.0 2012 4 11 21220
21221 4/11/2012 9:00:10 51.364 -176.097 5.5 20.8 2012 4 11 21221
21222 4/11/2012 9:01:07 2.199 89.441 5.9 10.0 2012 4 11 21222
21223 4/11/2012 9:27:57 1.254 91.735 6.0 10.0 2012 4 11 21223
21224 4/11/2012 10:43:11 0.802 92.463 8.2 25.1 2012 4 11 21224
21225 4/11/2012 11:53:36 2.913 89.544 5.7 10.0 2012 4 11 21225
21226 4/11/2012 13:58:05 1.495 90.854 5.5 5.0 2012 4 11 21226
21227 4/11/2012 19:04:20 1.190 92.092 5.5 14.5 2012 4 11 21227
21228 4/11/2012 22:41:46 43.584 -127.638 6.0 8.0 2012 4 11 21228
21229 4/11/2012 22:55:10 18.229 -102.689 6.5 20.0 2012 4 11 21229
21230 4/11/2012 23:56:33 1.841 89.685 5.8 10.0 2012 4 11 21230
21235 4/14/2012 10:56:19 -57.679 -65.308 6.2 15.0 2012 4 14 21235
21236 4/14/2012 15:13:14 49.380 155.651 5.6 90.3 2012 4 14 21236
21237 4/14/2012 19:26:43 -6.810 105.457 5.8 62.7 2012 4 14 21237
21238 4/14/2012 22:05:26 -18.972 168.741 6.2 11.0 2012 4 14 21238
earthquakes.loc[(earthquakes['ID'] == 21219) | (earthquakes['ID'] == 21224) | (earthquakes['ID'] == 21238)]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21219 4/11/2012 8:38:37 2.327 93.063 8.6 20.0 2012 4 11 21219
21224 4/11/2012 10:43:11 0.802 92.463 8.2 25.1 2012 4 11 21224
21238 4/14/2012 22:05:26 -18.972 168.741 6.2 11.0 2012 4 14 21238
tsu2012.loc[tsu2012[u'MONTH'] == 7]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2521 5460 2012 7.0 15.0 NaN NaN GREENLAND NaN ILULISSAT ICEFJORD 69.2 -51.3
eq2012.loc[(eq2012['Month'] == '7') & (eq2012['Days'] == '15')]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
tsu2012.loc[tsu2012[u'MONTH'] == 8]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2522 5462 2012 8.0 27.0 4.0 37.0 NICARAGUA NaN OFF THE COAST 12.139 -88.590
2523 5463 2012 8.0 31.0 12.0 47.0 PHILIPPINES NaN PHILIPPINE ISLANDS 10.811 126.638
eq2012.loc[(eq2012['Month'] == '8') & ((eq2012['Days'] == '27') | (eq2012['Days'] == '31'))]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21405 8/27/2012 4:37:19 12.139 -88.590 7.3 28.0 2012 8 27 21405
21406 8/27/2012 5:38:04 12.297 -88.612 5.5 35.0 2012 8 27 21406
21411 8/31/2012 12:47:33 10.811 126.638 7.6 28.0 2012 8 31 21411
21412 8/31/2012 23:37:58 10.388 126.719 5.6 40.3 2012 8 31 21412
earthquakes.loc[(earthquakes['ID'] == 21405) | (earthquakes['ID'] == 21411)]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21405 8/27/2012 4:37:19 12.139 -88.590 7.3 28.0 2012 8 27 21405
21411 8/31/2012 12:47:33 10.811 126.638 7.6 28.0 2012 8 31 21411
tsu2012.loc[tsu2012[u'MONTH'] == 9]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2524 5464 2012 9.0 5.0 14.0 42.0 COSTA RICA NaN COSTA RICA 10.085 -85.315
eq2012.loc[(eq2012['Month'] == '9') & (eq2012['Days'] == '5')]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21417 9/5/2012 13:09:10 -12.476 166.513 6.0 27.0 2012 9 5 21417
21418 9/5/2012 14:42:08 10.085 -85.315 7.6 35.0 2012 9 5 21418
earthquakes.loc[(earthquakes['ID'] == 21418)]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21418 9/5/2012 14:42:08 10.085 -85.315 7.6 35.0 2012 9 5 21418
tsu2012.loc[tsu2012[u'MONTH'] == 10]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2525 5467 2012 10.0 28.0 3.0 4.0 CANADA BC BRITISH COLUMBIA 52.788 -132.101
eq2012.loc[(eq2012['Month'] == '10') & (eq2012['Days'] == '28')]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21477 10/28/2012 3:04:09 52.788 -132.101 7.8 14.0 2012 10 28 21477
21478 10/28/2012 3:52:20 52.576 -131.962 5.5 10.0 2012 10 28 21478
21479 10/28/2012 18:54:21 52.674 -132.602 6.3 9.0 2012 10 28 21479
21480 10/28/2012 19:09:54 52.294 -132.082 5.6 10.0 2012 10 28 21480
earthquakes.loc[(earthquakes['ID'] == 21477)]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21477 10/28/2012 3:04:09 52.788 -132.101 7.8 14.0 2012 10 28 21477
tsu2012.loc[tsu2012[u'MONTH'] == 11]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2526 5468 2012 11.0 7.0 16.0 35.0 GUATEMALA NaN GUATEMALA 13.988 -91.895
eq2012.loc[(eq2012['Month'] == '11') & (eq2012['Days'] == '7')]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21493 11/7/2012 16:35:47 13.988 -91.895 7.4 24.0 2012 11 7 21493
21494 11/7/2012 22:42:48 13.849 -92.156 5.7 35.0 2012 11 7 21494
21495 11/7/2012 23:42:19 -8.652 148.034 5.6 118.4 2012 11 7 21495
earthquakes.loc[(earthquakes['ID'] == 21493)]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21493 11/7/2012 16:35:47 13.988 -91.895 7.4 24.0 2012 11 7 21493
tsu2012.loc[tsu2012[u'MONTH'] == 12]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2527 5469 2012 12.0 7.0 8.0 18.0 JAPAN NaN OFF EAST COAST OF HONSHU ISLAND 37.890 143.949
2528 5471 2012 12.0 28.0 NaN NaN CHINA NaN ZHAOJUN BRIDGE, HUBEI PROVINCE 31.256 110.733
eq2012.loc[(eq2012['Month'] == '12') & ((eq2012['Days'] == '7') | (eq2012['Days'] == '28'))]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21530 12/7/2012 8:18:23 37.890 143.949 7.3 31.0 2012 12 7 21530
21531 12/7/2012 8:31:15 37.914 143.764 6.2 32.0 2012 12 7 21531
21532 12/7/2012 8:48:13 37.828 143.607 5.5 20.2 2012 12 7 21532
21533 12/7/2012 18:19:06 -38.428 176.067 6.3 163.0 2012 12 7 21533
21534 12/7/2012 19:50:23 -7.661 146.954 5.7 139.8 2012 12 7 21534
21553 12/28/2012 17:32:18 -0.145 122.918 5.5 112.1 2012 12 28 21553
earthquakes.loc[(earthquakes['ID'] == 21530)]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21530 12/7/2012 8:18:23 37.89 143.949 7.3 31.0 2012 12 7 21530
eqtsu2012 = earthquakes.loc[(earthquakes['ID'] == 21144) | (earthquakes['ID'] == 21192) | (earthquakes['ID'] == 21203) | 
                (earthquakes['ID'] == 21405) | (earthquakes['ID'] == 21219) | (earthquakes['ID'] == 21224) | 
                (earthquakes['ID'] == 21238) | (earthquakes['ID'] == 21405) | (earthquakes['ID'] == 21411) | 
                (earthquakes['ID'] == 21418) | (earthquakes['ID'] == 21477) | (earthquakes['ID'] == 21493) | 
                (earthquakes['ID'] == 21530)]
eqtsu2012
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
21144 2/2/2012 13:34:41 -17.827 167.133 7.1 23.0 2012 2 2 21144
21192 3/14/2012 9:08:35 40.887 144.944 6.9 12.0 2012 3 14 21192
21203 3/20/2012 18:02:47 16.493 -98.231 7.4 20.0 2012 3 20 21203
21219 4/11/2012 8:38:37 2.327 93.063 8.6 20.0 2012 4 11 21219
21224 4/11/2012 10:43:11 0.802 92.463 8.2 25.1 2012 4 11 21224
21238 4/14/2012 22:05:26 -18.972 168.741 6.2 11.0 2012 4 14 21238
21405 8/27/2012 4:37:19 12.139 -88.590 7.3 28.0 2012 8 27 21405
21411 8/31/2012 12:47:33 10.811 126.638 7.6 28.0 2012 8 31 21411
21418 9/5/2012 14:42:08 10.085 -85.315 7.6 35.0 2012 9 5 21418
21477 10/28/2012 3:04:09 52.788 -132.101 7.8 14.0 2012 10 28 21477
21493 11/7/2012 16:35:47 13.988 -91.895 7.4 24.0 2012 11 7 21493
21530 12/7/2012 8:18:23 37.890 143.949 7.3 31.0 2012 12 7 21530
print float(len(eqtsu2012))/float(len(tsu2012)), float(len(eqtsu2012))/float(len(eq2012))
0.857142857143 0.0269662921348

About 86% of the tsunamis in 2012 were caused by earthquakes and about 2.7% of earthquakes in 2012 cause tsunamis.

plt.figure(figsize=(15,10))
displaymap2012 = Basemap(llcrnrlon=-180,llcrnrlat=-90,urcrnrlon=180,urcrnrlat=90)
displaymap2012.drawmapboundary()
displaymap2012.drawcountries()
displaymap2012.drawcoastlines()
longitude2012 = eqtsu2012[['Longitude']].values.tolist()
for i in range(0, len(longitude2012)):
    longitude2012[i] = float(longitude2012[i][0])
latitude2012 = eqtsu2012[['Latitude']].values.tolist()
for i in range(0, len(latitude2012)):
    latitude2012[i] = float(latitude2012[i][0])
lons2012,lats2012 = displaymap(longitude2012, latitude2012)
displaymap2012.plot(lons2012, lats2012, 'bo', color = "blue")
[<matplotlib.lines.Line2D at 0xc44cf98>]
plt.title("Earthquakes that Caused Tsunamis in 2012")
plt.show()

From the world map, all the earthquakes that caused the tsunamis were from areas near bodies of water.

min2012 = eqtsu2012['Magnitude'].min()
max2012 = eqtsu2012['Magnitude'].max()
print min2012, max2012
6.2 8.6

The magnitudes of earthquakes that caused tsunamis in 2012 ranges from 6.2 to 8.6

plt.figure(figsize=(10,10))
plt.hist(eqtsu2012['Magnitude'], bins = 5, alpha = 0.4)
plt.xlabel('Magnitude')
plt.ylabel('Frequency')
plt.title("Frequencies of Earthquakes that Caused Tsunamis in 2012")
plt.show()

png

From the histogram, most of the earthquakes that caused tsunamis lies between the range of 7 to 7.5 degrees of magnitude.

Now I pick another year, 1997 to see how much and what degree magnitudes of earthquakes cause tsunamis and see if the results are similar or consistent with the year 2012.

eq1997 = earthquakes.loc[(earthquakes['Year'] == '1997')]
tsu1997 = tsu.loc[tsu[u'YEAR'] == 1997]
tsu1997
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2362 5416 1997 4.0 10.0 NaN NaN HONDURAS NaN GULF OF FONSECA 13.100 -87.600
2364 2273 1997 4.0 21.0 12.0 2.0 SOLOMON ISLANDS NaN SANTA CRUZ IS. VANUATU -12.584 166.676
2365 2274 1997 7.0 9.0 19.0 24.0 VENEZUELA NaN CARIACO-CUMANA 10.598 -63.486
2366 3034 1997 9.0 30.0 6.0 27.0 JAPAN NaN S. OF HONSHU ISLAND 31.959 141.878
2367 2275 1997 10.0 14.0 9.0 53.0 TONGA NaN TONGA ISLANDS -22.100 -176.770
2368 2277 1997 12.0 5.0 11.0 26.0 RUSSIA NaN KAMCHATKA 54.841 162.035
2369 2278 1997 12.0 14.0 3.0 30.0 RUSSIA NaN KAMCHATKA 54.841 162.035
2370 2279 1997 12.0 26.0 8.0 NaN MONTSERRAT NaN WHITE RIVER VALLEY 16.720 -62.180
print len(tsu1997.index), len(eq1997.index)
8 456

In the year 1997, it looks like there are 2 tsunamis in April, 1 in July, 1 in September, 1 in October, and 3 in December with a total of 8 tsunamis. There are 456 earthquakes total in the year 1997.

tsu1997.loc[tsu1997[u'MONTH'] == 4]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2362 5416 1997 4.0 10.0 NaN NaN HONDURAS NaN GULF OF FONSECA 13.100 -87.600
2364 2273 1997 4.0 21.0 12.0 2.0 SOLOMON ISLANDS NaN SANTA CRUZ IS. VANUATU -12.584 166.676
eq1997.loc[(eq1997['Month'] == '4') & ((eq1997['Days'] == '10') | (eq1997['Days'] == '21'))]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
13495 4/21/1997 2:42:45 -0.149 124.073 5.5 50.0 1997 4 21 13495
13496 4/21/1997 12:02:26 -12.584 166.676 7.7 33.0 1997 4 21 13496
13497 4/21/1997 12:06:34 -12.881 166.464 6.1 33.0 1997 4 21 13497
13498 4/21/1997 12:11:28 -13.500 166.541 6.2 33.0 1997 4 21 13498
13499 4/21/1997 12:15:57 -13.406 166.344 6.0 33.0 1997 4 21 13499
13500 4/21/1997 12:20:50 -13.602 166.832 5.7 33.0 1997 4 21 13500
13501 4/21/1997 12:23:46 -13.673 166.455 5.5 33.0 1997 4 21 13501
13502 4/21/1997 12:28:28 -13.541 166.426 5.5 33.0 1997 4 21 13502
13503 4/21/1997 14:01:24 -7.382 125.715 5.9 432.3 1997 4 21 13503
13504 4/21/1997 21:23:54 -13.158 166.522 5.5 33.0 1997 4 21 13504
eq1997.loc[(eq1997['ID'] == 13496)]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
13496 4/21/1997 12:02:26 -12.584 166.676 7.7 33.0 1997 4 21 13496
tsu1997.loc[tsu1997[u'MONTH'] == 7]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2365 2274 1997 7.0 9.0 19.0 24.0 VENEZUELA NaN CARIACO-CUMANA 10.598 -63.486
eq1997.loc[(eq1997['Month'] == '7') & (eq1997['Days'] == '9')]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
13600 7/9/1997 19:24:13 10.598 -63.486 7.0 19.9 1997 7 9 13600
eq1997.loc[(eq1997['ID'] == 13600)]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
13600 7/9/1997 19:24:13 10.598 -63.486 7.0 19.9 1997 7 9 13600
tsu1997.loc[tsu1997[u'MONTH'] == 9]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2366 3034 1997 9.0 30.0 6.0 27.0 JAPAN NaN S. OF HONSHU ISLAND 31.959 141.878
eq1997.loc[(eq1997['Month'] == '9') & (eq1997['Days'] == '30')]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
13688 9/30/1997 6:27:25 31.959 141.878 6.2 10.0 1997 9 30 13688
eq1997.loc[(eq1997['ID'] == 13688)]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
13688 9/30/1997 6:27:25 31.959 141.878 6.2 10.0 1997 9 30 13688
tsu1997.loc[tsu1997[u'MONTH'] == 10]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2367 2275 1997 10.0 14.0 9.0 53.0 TONGA NaN TONGA ISLANDS -22.1 -176.77
eq1997.loc[(eq1997['Month'] == '10') & (eq1997['Days'] == '14')]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
13711 10/14/1997 9:53:18 -22.101 -176.772 7.8 167.3 1997 10 14 13711
13712 10/14/1997 15:23:10 42.962 12.892 5.5 10.0 1997 10 14 13712
eq1997.loc[(eq1997['ID'] == 13711)]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
13711 10/14/1997 9:53:18 -22.101 -176.772 7.8 167.3 1997 10 14 13711
tsu1997.loc[tsu1997[u'MONTH'] == 12]
ID YEAR MONTH DAY HOUR MINUTE COUNTRY STATE LOCATION_NAME LATITUDE LONGITUDE
2368 2277 1997 12.0 5.0 11.0 26.0 RUSSIA NaN KAMCHATKA 54.841 162.035
2369 2278 1997 12.0 14.0 3.0 30.0 RUSSIA NaN KAMCHATKA 54.841 162.035
2370 2279 1997 12.0 26.0 8.0 NaN MONTSERRAT NaN WHITE RIVER VALLEY 16.720 -62.180
eq1997.loc[(eq1997['Month'] == '12') & ((eq1997['Days'] == '5') | (eq1997['Days'] == '14') | (eq1997['Days'] == '26'))]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
13784 12/5/1997 8:08:50 55.281 162.444 5.5 33.0 1997 12 5 13784
13785 12/5/1997 11:26:55 54.841 162.035 7.8 33.0 1997 12 5 13785
13786 12/5/1997 11:35:20 53.909 161.550 5.7 33.0 1997 12 5 13786
13787 12/5/1997 11:37:09 54.512 162.318 5.6 33.0 1997 12 5 13787
13788 12/5/1997 13:56:12 0.656 125.114 5.5 89.4 1997 12 5 13788
13789 12/5/1997 18:48:23 53.752 161.746 6.4 33.0 1997 12 5 13789
13790 12/5/1997 19:04:07 53.792 161.596 5.5 33.0 1997 12 5 13790
13806 12/14/1997 2:39:17 -59.574 -26.186 5.7 33.0 1997 12 14 13806
13807 12/14/1997 8:48:36 -3.081 136.106 5.6 33.0 1997 12 14 13807
13808 12/14/1997 23:10:04 -15.571 -173.173 5.6 33.0 1997 12 14 13808
13829 12/26/1997 5:34:25 -22.338 -179.690 5.9 588.4 1997 12 26 13829
13830 12/26/1997 21:18:18 51.310 178.802 5.6 33.0 1997 12 26 13830
eq1997.loc[(eq1997['ID'] == 13785)]
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
13785 12/5/1997 11:26:55 54.841 162.035 7.8 33.0 1997 12 5 13785
eqtsu1997 = earthquakes.loc[(earthquakes['ID'] == 13469) | (earthquakes['ID'] == 13600) | (earthquakes['ID'] == 13688) | 
                (earthquakes['ID'] == 13711) | (earthquakes['ID'] == 23785)]
eqtsu1997
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
13469 4/2/1997 19:33:22 31.824 130.089 5.5 10.0 1997 4 2 13469
13600 7/9/1997 19:24:13 10.598 -63.486 7.0 19.9 1997 7 9 13600
13688 9/30/1997 6:27:25 31.959 141.878 6.2 10.0 1997 9 30 13688
13711 10/14/1997 9:53:18 -22.101 -176.772 7.8 167.3 1997 10 14 13711
print float(len(eqtsu1997))/float(len(tsu1997)), float(len(eqtsu1997))/float(len(eq1997))
0.5 0.00877192982456

About 50% of tsunamis were caused by earthquakes in 1997 and about 1% of earthquakes that year caused tsunamis.

plt.figure(figsize=(15,10))
displaymap1997 = Basemap(llcrnrlon=-180,llcrnrlat=-90,urcrnrlon=180,urcrnrlat=90)
displaymap1997.drawmapboundary()
displaymap1997.drawcountries()
displaymap1997.drawcoastlines()
longitude1997 = eqtsu1997[['Longitude']].values.tolist()
for i in range(0, len(longitude1997)):
    longitude1997[i] = float(longitude1997[i][0])
latitude1997 = eqtsu1997[['Latitude']].values.tolist()
for i in range(0, len(latitude1997)):
    latitude1997[i] = float(latitude1997[i][0])
lons1997,lats1997 = displaymap(longitude1997, latitude1997)
displaymap1997.plot(lons1997, lats1997, 'bo', color = "blue")
[<matplotlib.lines.Line2D at 0xb6ece10>]
plt.title("Earthquakes that Caused Tsunamis in 1997")
plt.show()

png

Again, all the earthquakes that caused tsunamis happened near or in bodies of water so it's consistent with the observation from the world map for 2012. I will not provide a histogram for this year as there are only 4 earthquakes that caused tsunamis this year.

min1997 = eqtsu1997['Magnitude'].min()
max1997 = eqtsu1997['Magnitude'].max()
print min1997, max1997
5.5 7.8

The range of earthquakes that caused tsunamis for 1997 is between 5.5 and 7.8.

Now I want to combine the datasets of earthquakes that cause tsunamis I have gotten in the previous parts to see how that fits in with the observations I have obtained so far.

eqcom = earthquakes.loc[(earthquakes['Year'] == '1997') | (earthquakes['Year'] == '2012')]
tsucom = tsu.loc[(tsu[u'YEAR'] == 1997) | (tsu[u'YEAR'] == 2012)]
frames = [eqtsu1997, eqtsu2012]
eqtsucom = pd.concat(frames)
eqtsucom
Date Time Latitude Longitude Magnitude Depth Year Month Days ID
13469 4/2/1997 19:33:22 31.824 130.089 5.5 10.0 1997 4 2 13469
13600 7/9/1997 19:24:13 10.598 -63.486 7.0 19.9 1997 7 9 13600
13688 9/30/1997 6:27:25 31.959 141.878 6.2 10.0 1997 9 30 13688
13711 10/14/1997 9:53:18 -22.101 -176.772 7.8 167.3 1997 10 14 13711
21144 2/2/2012 13:34:41 -17.827 167.133 7.1 23.0 2012 2 2 21144
21192 3/14/2012 9:08:35 40.887 144.944 6.9 12.0 2012 3 14 21192
21203 3/20/2012 18:02:47 16.493 -98.231 7.4 20.0 2012 3 20 21203
21219 4/11/2012 8:38:37 2.327 93.063 8.6 20.0 2012 4 11 21219
21224 4/11/2012 10:43:11 0.802 92.463 8.2 25.1 2012 4 11 21224
21238 4/14/2012 22:05:26 -18.972 168.741 6.2 11.0 2012 4 14 21238
21405 8/27/2012 4:37:19 12.139 -88.590 7.3 28.0 2012 8 27 21405
21411 8/31/2012 12:47:33 10.811 126.638 7.6 28.0 2012 8 31 21411
21418 9/5/2012 14:42:08 10.085 -85.315 7.6 35.0 2012 9 5 21418
21477 10/28/2012 3:04:09 52.788 -132.101 7.8 14.0 2012 10 28 21477
21493 11/7/2012 16:35:47 13.988 -91.895 7.4 24.0 2012 11 7 21493
21530 12/7/2012 8:18:23 37.890 143.949 7.3 31.0 2012 12 7 21530
print float(len(eqtsucom))/float(len(tsucom)), float(len(eqtsucom))/float(len(eqcom))
0.727272727273 0.0177580466149

When averaged, approximately 72% of tsunamis are caused by earthquakes and about 2% of those earthquakes cause tsunamis.

plt.figure(figsize=(15,10))
displaymapcom = Basemap(llcrnrlon=-180,llcrnrlat=-90,urcrnrlon=180,urcrnrlat=90)
displaymapcom.drawmapboundary()
displaymapcom.drawcountries()
displaymapcom.drawcoastlines()
longitudecom = eqtsucom[['Longitude']].values.tolist()
for i in range(0, len(longitudecom)):
    longitudecom[i] = float(longitudecom[i][0])
latitudecom = eqtsucom[['Latitude']].values.tolist()
for i in range(0, len(latitudecom)):
    latitudecom[i] = float(latitudecom[i][0])
lonscom,latscom = displaymap(longitudecom, latitudecom)
displaymapcom.plot(lonscom, latscom, 'bo', color = "blue")
[<matplotlib.lines.Line2D at 0xb707048>]
plt.title("Earthquakes that Caused Tsunamis in Both Years")
plt.show()

png

All the earthquakes that cause tsunamis are located near or in bodies of water. This has been a consistent observation so far.

mincom = eqtsucom['Magnitude'].min()
maxcom = eqtsucom['Magnitude'].max()
print mincom, maxcom
5.5 8.6

The range of earthquakes that cause earthquakes for this set of observations is between 5.5 o 8.6 degrees of magnitude.

plt.figure(figsize=(10,10))
plt.hist(eqtsucom['Magnitude'], bins = 5, alpha = 0.4)
plt.xlabel('Magnitude')
plt.ylabel('Frequency')
plt.title("Frequencies of Earthquakes that Caused Tsunamis in Combined Dataset")
plt.show()

png

In the histogram, it is shown that a majority of tsunamis are caused by earthquakes between 7 to 8 degrees of magnitude which is consistent with the observation I obtained in the 2012 dataset.

Conclusion

In conclusion, most tsunamis are caused by earthquakes located near or in bodies of water on the world map but about less than 5% of earthquakes in the world actually cause tsunamis itself. I have found that the majority of earthquakes that cause tsunamis have a magnitude between 5 and 9 which are the big earthquakes. The samples I have taken are not representative of the whole dataset because the dataset could not be merged together but I believe that the results would be more accurate if there is more data that had been analyzed and if there is a larger sample for the data.

Part 4: The Relationship Between Earthquakes and Volcanos

Natalie Marcom

Adding onto the discussion of how earthquakes affect tsunamis, we will also discuss how earthquakes may affect volcanic eruptions. There are approximetely 1.5k active volcanos on earth. However, I will focus on connecting earthquakes and volcanic eruptions to stay within the scope of the class, as I am not a geophysicist.

I used data from NOAA, a website from Oregonstate.edu with the list of volcanos with their latitude and longitude, volcano and plate boundary shapefiles from ArcMap (Esri), as well as data from volcanodiscovery.org to find data concerning recent earthquakes near volcanos.

import requests
from lxml import html
from mpl_toolkits.basemap import Basemap

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Let's plot all 1500 volcanos on a map to see where most of them are located. Due to the difficulty to acquire a reasonable dataset of volcanos, besides a shapefile from Arcmap, we will scrape from a website that indicates the Latitude and longiude of all the volcanos to make plotting easy. We will also plot the volcanos on a basemap by the size of the volcano, via it's elevation height in meters.

page = requests.get('http://volcano.oregonstate.edu/oldroot/volcanoes/alpha.html')
tree = html.fromstring(page.content)
tables = tree.xpath('//table')

volcano_data = []
for volc in range(4, len(tables)):
    df = pd.read_html(html.tostring(tables[volc]), header=0)[0]
    volcano_data.append(df)
df_volc = pd.concat(volcano_data, ignore_index=True)

Let's look at a small snippet of the volcano dataset that was scraped. We will take note of the main observations of this dataset.

df_volc.head(10)
Name Location Type Latitude Longitude Elevation (m)
0 Abu Honshu-Japan Shield volcanoes 34.50 131.60 641.0
1 Acamarachi Chile-N Stratovolcano -23.30 -67.62 6046.0
2 Acatenango Guatemala Stratovolcano 14.50 -90.88 3976.0
3 Acigöl-Nevsehir Turkey Caldera 38.57 34.52 1689.0
4 Adams US-Washington Stratovolcano 46.21 -121.49 3742.0
5 Adams Seamount Pacific-C Submarine volcano -25.37 -129.27 -39.0
6 Adatara Honshu-Japan Stratovolcanoes 37.64 140.29 1718.0
7 Adwa Ethiopia Stratovolcano 10.07 40.84 1733.0
8 Afderà Ethiopia Stratovolcano 13.08 40.85 1295.0
9 Agrigan Mariana Is-C Pacific Stratovolcano 18.77 145.67 965.0

Where are the volcanos located? Are they near tetonic plates?

import pandas as pd
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt1
import matplotlib as mpl
import shapefile
from mpl_toolkits.basemap import Basemap
import geopandas as gp
import os as osf
osf.chdir('C:\Users\jenat\\Documents\\ringoffire\\new')

volc = gp.GeoDataFrame.from_file('volcs.shp')
plt1.figure(figsize = (20, 12))
y = volc.LATX
x = volc.LONGX
map1 = Basemap()
map1.readshapefile('plate', 'plate')
map1.drawmapboundary(fill_color = 'lightskyblue')
map1.fillcontinents(color = 'lavender',lake_color = 'aqua')
map1.drawcountries()
map1.drawcoastlines()
volc_info = map1.readshapefile('volc1', 'volcs')

x1,y1 = map1(x,y)
map1.scatter(x1,y1,c = 'red',marker = "o",alpha = 1.0)
plt1.title("Map of Volcanos and Plate Boundaries", fontsize = 25)
plt1.show()

png

Using two shape files (one for plate bounaries, the other of the world's volcanos), we see that majority of the volcanos are very close to plate boundaries, that or they are along the tetonic plate boundaries.


However, besides plotting the volcanos on a map, let us take it a step further and plot volcanos as well as data that indicates whether one of these volcanos, had an eruption that was associated with an earthquake. We will use two datasets to answer this question. The second dataset with the earthquake information mainly looks at volcano eruptions from 1790 to the present. I have decided to look at world volcanos for that data and not focus on a particular region of the world.


How many of the volcanos have had eruptions that were associated with earthquakes?

import os
import pandas as pd
from mpl_toolkits.basemap import Basemap
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
os.chdir('C:\Users\jenat\Documents')
#second dataset
data = pd.read_csv("new_world_data_results_up1.csv")
data
Year Month Day TSU EQ Name Location Country Latitude Longitude Elevation Type Status
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 -1610.0 NaN NaN TSU EQ Santorini Greece Greece 36.404 25.396 329.0 Shield volcano Historical
2 766.0 7.0 20.0 TSU EQ Sakura-jima Kyushu-Japan Japan 31.580 130.670 1117.0 Stratovolcano Historical
3 1169.0 2.0 4.0 TSU EQ Etna Italy Italy 37.734 15.004 3350.0 Stratovolcano Historical
4 1565.0 8.0 NaN NaN EQ Pacaya Guatemala Guatemala 14.381 -90.601 2552.0 Complex volcano Historical
5 1600.0 2.0 19.0 NaN EQ Huaynaputina Peru Peru -16.608 -70.850 4850.0 Stratovolcano Historical
6 1631.0 2.0 14.0 NaN EQ Dama Ali Africa-NE Ethiopia 11.280 41.630 1068.0 Shield volcano Historical
7 1631.0 12.0 16.0 TSU EQ Vesuvius Italy Italy 40.821 14.426 1281.0 Complex volcano Historical
8 1640.0 7.0 31.0 TSU EQ Komaga-take Hokkaido-Japan Japan 42.070 140.680 1140.0 Stratovolcano Historical
9 1659.0 9.0 30.0 NaN EQ San Salvador El Salvador El Salvador 13.736 -89.286 1893.0 Stratovolcano Historical
10 1669.0 3.0 11.0 NaN EQ Etna Italy Italy 37.734 15.004 3350.0 Stratovolcano Historical
11 1679.0 9.0 21.0 NaN EQ Zukur Red Sea Yemen 14.020 42.750 624.0 Shield volcano Holocene
12 1693.0 1.0 9.0 NaN EQ Etna Italy Italy 37.734 15.004 3350.0 Stratovolcano Historical
13 1707.0 12.0 16.0 NaN EQ Fuji Honshu-Japan Japan 35.350 138.730 3776.0 Stratovolcano Historical
14 1716.0 9.0 24.0 TSU EQ Taal Luzon-Philippines Philippines 14.002 120.993 400.0 Stratovolcano Historical
15 1741.0 8.0 23.0 TSU EQ Oshima-Oshima Hokkaido-Japan Japan 41.500 139.370 737.0 Stratovolcano Historical
16 1749.0 8.0 11.0 TSU EQ Taal Luzon-Philippines Philippines 14.002 120.993 400.0 Stratovolcano Historical
17 1754.0 5.0 13.0 TSU EQ Taal Luzon-Philippines Philippines 14.002 120.993 400.0 Stratovolcano Historical
18 1757.0 7.0 9.0 NaN EQ San Jorge Azores Portugal 38.650 -28.080 1053.0 Fissure vent Historical
19 1792.0 5.0 21.0 TSU EQ Unzen Kyushu-Japan Japan 32.750 130.300 1500.0 Complex volcano Historical
20 1820.0 3.0 1.0 TSU EQ Westdahl Aleutian Is United States 54.520 -164.650 1654.0 Stratovolcano Historical
21 1827.0 6.0 27.0 TSU EQ Avachinsky Kamchatka Russia 53.255 158.830 2741.0 Stratovolcano Historical
22 1837.0 9.0 NaN TSU EQ Peuet Sague Sumatra Indonesia 4.914 96.329 2801.0 Complex volcano Historical
23 1840.0 2.0 2.0 TSU EQ Gamalama Halmahera-Indonesia Indonesia 0.800 127.325 1715.0 Stratovolcano Historical
24 1845.0 2.0 8.0 TSU EQ Soputan Sulawesi-Indonesia Indonesia 1.108 124.725 1784.0 Stratovolcano Historical
25 1857.0 4.0 17.0 TSU EQ Umboi New Guinea-NE of Papua New Guinea -5.589 147.875 1548.0 Complex volcano Holocene
26 1863.0 8.0 17.0 TSU EQ Yasur Vanuatu-SW Pacific Vanuatu -19.520 169.425 361.0 Stratovolcano Historical
27 1868.0 4.0 3.0 TSU EQ Mauna Loa Hawaiian Is United States 19.475 -155.608 4170.0 Shield volcano Historical
28 1868.0 9.0 5.0 TSU EQ Etna Italy Italy 37.734 15.004 3350.0 Stratovolcano Historical
29 1871.0 4.0 30.0 TSU EQ Camiguin Mindanao-Philippines Philippines 9.203 124.673 1332.0 Stratovolcano Historical
30 1877.0 2.0 14.0 TSU EQ Mauna Loa Hawaiian Is United States 19.475 -155.608 4170.0 Shield volcano Historical
31 1878.0 2.0 11.0 TSU EQ Yasur Vanuatu-SW Pacific Vanuatu -19.520 169.425 361.0 Stratovolcano Historical
32 1878.0 8.0 29.0 TSU EQ Okmok Aleutian Is United States 53.420 -168.130 1073.0 Shield volcano Historical
33 1885.0 5.0 25.0 NaN EQ Purace Colombia Colombia 2.320 -76.400 4650.0 Stratovolcano Historical
34 1889.0 9.0 6.0 TSU EQ Banua Wuhu Sangihe Is-Indonesia Indonesia 3.138 125.491 -5.0 Submarine volcano Historical
35 1901.0 8.0 9.0 TSU EQ Epi Vanuatu-SW Pacific Vanuatu -16.680 168.370 833.0 Stratovolcano Historical
36 1909.0 4.0 28.0 NaN EQ Cameroon, Mt. Africa-W Cameroon 4.203 9.170 4095.0 Stratovolcano Historical
37 1911.0 1.0 30.0 TSU EQ Taal Luzon-Philippines Philippines 14.002 120.993 400.0 Stratovolcano Historical
38 1913.0 3.0 14.0 TSU EQ Awu Sangihe Is-Indonesia Indonesia 3.670 125.500 1320.0 Stratovolcano Historical
39 1914.0 1.0 12.0 TSU EQ Sakura-jima Kyushu-Japan Japan 31.580 130.670 1117.0 Stratovolcano Historical
40 1917.0 6.0 7.0 NaN EQ San Salvador El Salvador El Salvador 13.736 -89.286 1893.0 Stratovolcano Historical
41 1933.0 1.0 8.0 TSU EQ Kharimkotan Kuril Is Russia 49.120 154.508 1145.0 Stratovolcano Historical
42 1937.0 5.0 29.0 TSU EQ Rabaul New Britain-SW Pac Papua New Guinea -4.271 152.203 688.0 Pyroclastic shield Historical
43 1951.0 8.0 3.0 TSU EQ Cosiguina Nicaragua Nicaragua 12.980 -87.570 872.0 Stratovolcano Historical
44 1957.0 3.0 11.0 NaN EQ Vsevidof Aleutian Is United States 53.130 -168.680 2149.0 Stratovolcano Historical
45 1960.0 5.0 25.0 TSU EQ Puyehue Chile-C Chile -40.590 -72.117 2236.0 Stratovolcano Holocene
46 1963.0 5.0 16.0 NaN EQ Agung Lesser Sunda Is Indonesia -8.342 115.508 3142.0 Stratovolcano Historical
47 1975.0 11.0 29.0 TSU EQ Kilauea Hawaiian Is United States 19.425 -155.292 1222.0 Shield volcano Historical
48 1980.0 5.0 18.0 TSU EQ St. Helens US-Washington United States 46.200 -122.180 2549.0 Stratovolcano Historical
49 1982.0 3.0 28.0 NaN EQ Chichon, El Mexico Mexico 17.360 -93.228 1150.0 Tuff cone Historical
50 1983.0 10.0 3.0 NaN EQ Miyake-jima Izu Is-Japan Japan 34.080 139.530 815.0 Stratovolcano Historical
51 1987.0 12.0 1.0 NaN EQ Sirung Lesser Sunda Is Indonesia -8.510 124.148 862.0 Complex volcano Historical
52 1991.0 6.0 15.0 NaN EQ Pinatubo Luzon-Philippines Philippines 15.130 120.350 1486.0 Stratovolcano Historical
53 2000.0 6.0 27.0 TSU EQ Miyake-jima Izu Is-Japan Japan 34.080 139.530 815.0 Stratovolcano Historical
54 2002.0 8.0 28.0 NaN EQ Etna Italy Italy 37.734 15.004 3350.0 Stratovolcano Historical
55 2010.0 5.0 29.0 TSU EQ Sarigan Mariana Is-C Pacific United States 16.708 145.780 538.0 Stratovolcano Holocene
def plot_map2(lons, lats, elevations, llcrnrlat = -80, urcrnrlat = 90, llcrnrlon = -180, urcrnrlon = 180,resolution = 'i', projection='mill', lat_0 = 39.5, lon_0 = 1,min_marker_size=5):
    bins = np.linspace(0, elevations.max(), 10)
    marker_sizes = np.digitize(elevations, bins) + min_marker_size
    m2 = Basemap(projection=projection, llcrnrlat=llcrnrlat, urcrnrlat=urcrnrlat, llcrnrlon=llcrnrlon, urcrnrlon=urcrnrlon, resolution=resolution)
    m2.drawcountries()
    m2.drawmapboundary(fill_color='lightskyblue')
    m2.fillcontinents(color = '#ddaa66',lake_color='aqua')
    m2.drawcoastlines()

    for lon, lat, m2size in zip(lons, lats, marker_sizes):
        x, y = m2(lon, lat)
        m2.plot(x, y, 'bs', markersize=m2size, alpha=.7, zorder=4)

    return m2

def plot_map1(lons, lats, elevations, llcrnrlat=-80, urcrnrlat=90, llcrnrlon=-180, urcrnrlon=180,resolution='i', projection='mill', lat_0 = 39.5, lon_0 = 1,min_marker_size=2):
    bins = np.linspace(0, elevations.max(), 10)
    marker_sizes = np.digitize(elevations, bins) + min_marker_size
    m = Basemap(projection=projection, llcrnrlat=llcrnrlat, urcrnrlat=urcrnrlat, llcrnrlon=llcrnrlon, urcrnrlon=urcrnrlon, resolution=resolution)
    m.drawcountries()
    m.drawmapboundary(fill_color='lightskyblue')
    m.fillcontinents(color = '#ddaa66',lake_color='aqua')
    m.drawcoastlines()

    for lon, lat, msize in zip(lons, lats, marker_sizes):
        x, y = m(lon, lat)
        m.plot(x, y, '^r', markersize=msize, alpha=.7, zorder=4)

    return m

plt.figure(figsize=(60, 30))
m2 = plot_map2(data['Longitude'], data['Latitude'], data['Elevation'], min_marker_size=35)
m = plot_map1(df_volc['Longitude'], df_volc['Latitude'], df_volc['Elevation (m)'], min_marker_size=10)


plt.title('Volcano Eruptions with Associated Earthquakes', color='#000000', fontsize=50)

plt.show()

png


In the original NOAA dataset, there are 797 volcanic eruption observations, and 55 of them are eruptions associated with earthquakes. Taking this into account from this dataset (Volcanic eruptions from 1790-2016), 6.9% of the volcanic eruptions from the NOAA dataset, had an association with an earthquake.


The red triangles indicate the volcanos, and the blue squares indicate the volcanos who had an association with an earthquake prior to its eruption. Out of 1500 volcanos, there were about 55 volcanic eruptions that had this association. Many have these occurred in the 20th century. We also see that the majority of these earthquake and volcano association have happened along the ring of fire, which stretches along the Eastern edge of Asia, down to New Zealand, as well as from Alaska down to South America.


Closer Examination of Volcano Eruptions with Associated Earthquakes

Let's examine the different types of volcanos as well as the top 10 countries that had the most volcanic eruptions with associated earthquakes. Is there a particular region that had the most volcano eruptions?

import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

Is there a type of Volcano that is more frequent with eruptions?

objects = ('Stratovolcano', 'Shield Volcano', 'Complex Volcano', 'Pyroclastic shield', 'Tuff cone', 'Fissure vent','Submarine volcano')
y_pos = np.arange(len(objects))
performance = [38,7,6,1,1,1,1]
plt.barh(y_pos, performance, align='center', alpha=0.5)
plt.yticks(y_pos, objects)
plt.xlabel('Amount')
plt.title('Variation of Volcano Types with Associated Earthquakes')
plt.show()

png

We see that stratovolcanos (for instance Mount St.Helens, is a stratovolcano) had the overall highest frequency of volcanic eruptions, and by a large proportion.

Which country has had volcanic eruptions the most?

data['Country'].value_counts()[:10].plot(kind = 'barh', title = 'Top 10 Countries with Volcanic Eruptions with Associated Earthquakes')
plt.show()

png

We see that the United States and Japan have an equal amount of volcanic eruptions that had associations with earthquakes.

Due to the lack of magnitude observation, from the NOAA data (which gave an option of volcanic eruptions with association of earthquakes), a goal is to have a better observation of more detailed variables to help establish a correlatiopn between earthquakes and volcanic eruptions. However, because this is a topic that scientists are still debating, and many do not see an exact correlation between the two, we will take a different approach that may lead us to answers that we are looking for, which is establishing a correlation between earthquakes and volcanos.

Using data concerning earthquakes occurring close to volcanos

Examining link between Earthquakes and Volcanic eruptions

As stated before, scientists still are debating whether earthquakes and volcanic eruptions are connected or not, and there is a lack of information available that proved that the two are substantially linked to one or the other. However, I have found enough data indicating that earthquakes do occur near volcanos, which can suggest that it is possible for earthquakes and volcanos to be somewhat linked.

Is it possible for earthquakes and volcanos to come into close contact with one another?

import os
os.chdir('C:\Users\jenat\\Documents\\ringoffire')
import pandas as pd
import numpy as np
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import matplotlib as mpl

os.chdir('C:\Users\jenat\Documents\\ringoffire')
eqdata = pd.read_csv('earthquakesdata.csv')#dataset
eqdata1 = eqdata.convert_objects(convert_numeric=True)
C:\Users\jenat\Anaconda2\lib\site-packages\ipykernel\__main__.py:3: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  app.launch_new_instance()

Feb and March 2017 Earthquakes near Volcano data:

eqdata1
Time Mag Depth Location Latitude Longitude
0 Sat, 18 ar 19:47 UTC 2.3 13.2 - 3 SSW of Volcano, Hawaii 19.4000 -155.2500
1 Sat, 18 ar 14:48 UTC 1.9 17.6 11 SSW fro Corinth 19.3975 -155.2522
2 Sat, 18 ar 13:57 UTC 1.6 2.2 4.6 SSW of Her�ubrei� 37.7902 14.9158
3 Sat, 18 ar 13:13 UTC 2.3 1 016 S 66? W of Wao (Lanao Del Sur) 37.8527 22.8490
4 Sat, 18 ar 12:57 UTC 2.2 27.7 - 5 NNW of Volcano, Hawaii 65.1360 -16.3860
5 Sat, 18 ar 12:29 UTC 1.8 1.8 - 5 WSW of Volcano, Hawaii 7.5900 124.6200
6 Sat, 18 ar 12:08 UTC 1.6 0.7 2.7 ESE of Go�abunga 19.4768 -155.2662
7 Sat, 18 ar 11:41 UTC 1.8 4 3.7 SW of Her�ubrei� 19.4047 -155.2835
8 Sat, 18 ar 11:27 UTC 3 17 012 S 87? W of Wao (Lanao Del Sur)I FELT IT 19.4372 -155.6165
9 Sat, 18 ar 11:20 UTC 1.5 4 4.1 SW of Her�ubrei� 63.6350 -19.1960
10 Sat, 18 ar 10:51 UTC 4.7 10 Northern Suatra, IndonesiaI FELT IT 65.1510 -16.4060
11 Sat, 18 ar 10:43 UTC 1.6 3.3 3.8 SW of Her�ubrei� 7.6400 124.6500
12 Sat, 18 ar 10:07 UTC 2.3 3.1 3.1 SW of Her�ubrei� 65.1460 -16.4070
13 Sat, 18 ar 10:07 UTC 1.6 6.4 4.4 SW of Her�ubrei� 3.4200 98.4800
14 Sat, 18 ar 09:51 UTC 1.6 7.7 5.7 SW of Her�ubrei� 65.1500 -16.4070
15 Sat, 18 ar 09:28 UTC 1.6 4.8 5.3 SW of Her�ubrei� 65.1570 -16.4030
16 Sat, 18 ar 09:25 UTC 2 4.8 5.3 SW of Her�ubrei� 65.1460 -16.4140
17 Sat, 18 ar 09:25 UTC 1.5 9.2 5.1 N of Her�ubrei�art�gl 65.1370 -16.4340
18 Sat, 18 ar 08:59 UTC 1.5 7.6 3.1 N of B�r�arbunga 65.1370 -16.4200
19 Sat, 18 ar 08:44 UTC 1.6 3.8 - 11 WNW of Calipatria, CA 65.1350 -16.4140
20 Sat, 18 ar 08:40 UTC 2.1 3.4 4.0 SW of Her�ubrei� 65.1330 -16.3990
21 Sat, 18 ar 08:26 UTC 2.5 4 SOUTHERN CALIFORNIA 64.6680 -17.5160
22 Sat, 18 ar 08:26 UTC 1.5 4.8 4.6 SW of Her�ubrei� 33.1607 -115.6203
23 Sat, 18 ar 06:47 UTC 1.5 6.6 4.8 SW of Her�ubrei� 65.1480 -16.4070
24 Sat, 18 ar 06:22 UTC 2.9 7.1 5.1 SW of Her�ubrei�I FELT IT 33.1500 -115.6300
25 Sat, 18 ar 06:22 UTC 2.1 3.3 5.0 SW of Her�ubrei� 65.1440 -16.4170
26 Sat, 18 ar 05:28 UTC 2.4 5.1 4.9 SW of Her�ubrei� 65.1450 -16.4230
27 Sat, 18 ar 05:28 UTC 2.2 5.1 4.9 SW of Her�ubrei� 65.1410 -16.4240
28 Sat, 18 ar 05:05 UTC 2 26.4 - 5 NW of Volcano, Hawaii 65.1420 -16.4240
29 Sat, 18 ar 04:37 UTC 1.5 3.6 3.8 SW of Her�ubrei� 65.1440 -16.4250
... ... ... ... ... ... ...
826 Thu, 2 Feb 18:04 UTC 3 7 SOUTHERN GREECE -39.2588 173.9287
827 Thu, 2 Feb 12:47 UTC 3.3 8 16 al Norte de Cascajal, V. de Coronado. 19.3812 -155.2410
828 Thu, 2 Feb 06:52 UTC 2.1 1.2 - 128 NNW of Kodiak Station, Alaska 58.3636 -154.7016
829 Thu, 2 Feb 04:50 UTC 1.9 3 Alaska 19.3073 -155.2138
830 Thu, 2 Feb 01:37 UTC 2.4 5.9 - 123 NNW of Kodiak Station, Alaska -39.4653 175.7146
831 Wed, 1 Feb 21:37 UTC 1.9 9.3 - 119 SE of Old Iliana, Alaska 55.6660 160.3470
832 Wed, 1 Feb 21:33 UTC 2.3 8.5 - 127 SE of Old Iliana, Alaska 38.8077 -122.7707
833 Wed, 1 Feb 21:29 UTC 1.9 14.8 Catania 55.6980 160.4760
834 Wed, 1 Feb 18:52 UTC 2.1 3.1 Avellino 37.5500 23.5900
835 Wed, 1 Feb 18:24 UTC 2 1.2 Catania 10.1290 -83.9620
836 Wed, 1 Feb 17:58 UTC 2.3 23.5 14.4 SW fro Leni (E) 58.7621 -153.6923
837 Wed, 1 Feb 16:43 UTC 2.4 1 058 N 45? E of Davao City 58.8027 -153.8385
838 Wed, 1 Feb 16:21 UTC 2 2 NORTHERN CALIFORNIA 58.7243 -153.6634
839 Wed, 1 Feb 14:23 UTC 2.1 2.8 - 7 SW of Volcano, Hawaii 58.9080 -153.6289
840 Wed, 1 Feb 14:19 UTC 2.3 1.9 - 2 SSW of Cobb, California 58.8481 -153.5432
841 Wed, 1 Feb 13:25 UTC 1.9 11.5 21 SSE fro Aigina 37.6653 14.9807
842 Wed, 1 Feb 13:18 UTC 2.6 3 ISLAND OF HAWAII, HAWAII 40.8987 14.6692
843 Wed, 1 Feb 12:33 UTC 2.1 5 1.5 ENE of Kr�suv�k 37.7540 15.0060
844 Wed, 1 Feb 11:24 UTC 2.4 12 SOUTHERN GREECE 38.4690 14.7060
845 Wed, 1 Feb 10:40 UTC 3 4 8 al Norte de Capellades, Alvarado. 7.4800 125.9900
846 Wed, 1 Feb 09:59 UTC 2.2 5.2 New Zealand 38.7600 -122.7300
847 Wed, 1 Feb 09:47 UTC 2.3 15 Alaska 19.3827 -155.2812
848 Wed, 1 Feb 09:20 UTC 2.7 1 SOUTHERN GREECE 38.8025 -122.7377
849 Wed, 1 Feb 08:16 UTC 2.8 0.2 - 96 NNW of Nikiski, Alaska 37.5725 23.5370
850 Wed, 1 Feb 07:29 UTC 2.1 3 NORTHERN CALIFORNIA 19.3900 -155.2800
851 Wed, 1 Feb 05:56 UTC 1.9 17 Catania 63.8930 -22.0380
852 Wed, 1 Feb 02:32 UTC 2.3 3 NORTHERN CALIFORNIA 37.6000 23.5100
853 Wed, 1 Feb 00:41 UTC 2.3 3 ISLAND OF HAWAII, HAWAII 9.9900 -83.8030
854 Wed, 1 Feb 00:39 UTC 2.1 2 NORTHERN CALIFORNIA -37.6903 177.2383
855 Wed, 1 Feb 00:39 UTC 2.8 3 ISLAND OF HAWAII, HAWAII 61.4317 -152.2931

856 rows × 6 columns

These are two small datasets consisting of earthquakes that have happened near volcanos since Feb 1-March 18th. As we can see from these datasets, particularly the distance (km) from the volcano itself, we see that it is very likely that earthquakes and volcanos can come into close contact with another, thus the possibiltiy of volcanic eruptions and earthquakes occurring is a possibility, as it is proven in the first dataset. The question remains, how frequenly does it occur, and what causes it (two questions for Geologists!)


latlong = pd.read_csv('latlong.csv')
eqdata = pd.read_csv('earthquakesdata.csv')

#earth.Latitude
#earth.Longitude


def earth_near(lons, lats, magnitude, min_marker_size=2):
    bins = np.linspace(0, magnitude.max(), 10)
    marker_sizes = np.digitize(magnitude, bins) + min_marker_size

    m = Basemap()
    m.readshapefile('C:\Users\jenat\\Documents\\ringoffire\\new\\plate', 'plate')
    
    
    m.bluemarble(alpha=0.42)

    for lon, lat, msize in zip(lons, lats, marker_sizes):
        x, y = m(lon, lat)
        m.plot(x, y, '*', c='#fff8dc',markersize=msize, alpha=1.0, zorder=10)

    return m
    


Legend for Plot:
Symbol
Meaning
*Earthquake
oVolcano
LinePlate boundary

plt.figure(figsize=(15, 12))
map1.scatter(x1,y1,c='red',marker="o",alpha=0.7)
m = earth_near(eqdata1['Longitude'], eqdata1['Latitude'], eqdata1['Mag'], min_marker_size=2)
plt.title('Earthquakes near Volcanos Since Feb 1', color='#000000', fontsize=40)
plt.show()

png


We see that they are quite close to tetonic plates. The white stars are the earthquakes, and the red circles are the volcanos. As we see, the earthquakes are all quite close to the volcanos. In addition, the size of the stars is based upon the magnitude of the earthquake.


Where are these earthquakes happening the most?

import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
eqdata1['Location'].value_counts()[:10].plot(kind = 'barh', title = 'Top 10 Locations with Earthquakes near Volcanos since Feb 1')
plt.show()

png

For Top 3 (out of 10) : We see that New Zealand has had the most earthquakes, followed by the big island of Hawai'i, then Russia. We also see that Central California, southern California, Northern California (which should include the Geysers) also have a lot of activity as well.


Is there a specific magnitude that is happening more frequently?

plt.figure()
plt.hist(eqdata1['Mag'].dropna(), bins = 20)
plt.xlabel('Magnitude')
plt.ylabel('Amount')
plt.title("Variation and Amount of Earthquake Magnitudes Since Feb 1")
plt.show()

png

Most of the earthquakes magnitudes are quite small, as in 2.5 or below.


Is there a correlation between the depth of the earthquake and the magnitude of the earthquake?

import matplotlib.pyplot
import pylab
import os
os.chdir('C:\Users\jenat\\Documents\\ringoffire')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

tes3 = pd.read_csv('earthquakesdata.csv',usecols = [1,2])#dataset
data1 = tes3.convert_objects(convert_numeric=True)
data1 = data1.rename(columns={' Depth': 'Depth'})

matplotlib.pyplot.scatter(data1.Mag,data1.Depth)
matplotlib.pyplot.title('Scatter Plot of Magnitudes and Depths of Earthquakes')
matplotlib.pyplot.xlabel("Magnitude")
matplotlib.pyplot.ylabel("Depth (M)")
matplotlib.pyplot.show()

C:\Users\jenat\Anaconda2\lib\site-packages\ipykernel\__main__.py:11: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.

png


As we can see, there is not a strong correlation between Magnitudes of earthquakes and the depths of the earthquakes. Most of the earthquakes from smaller magnitudes to the larger ones are typically within the same range of depth, which indicates that magnitude an depth are likely not correlated.


Let's use a Spearman, non-parametric correlation test between Magnitude and Depth

import os
os.chdir('C:\Users\jenat\\Documents\\ringoffire')
import pandas as pd

data1.corr()

data1.corr(method='spearman', min_periods=1)
Mag Depth
Mag 1.000000 0.139534
Depth 0.139534 1.000000

The matrix correlation, using the spearman test concerning the two columns magnitude and Depth, indicates too that there is not a strong correlation between Magnitude and Depth.



How close are the earthquakes happening (km) from the volcanos?:

When earthquakes occurr before a volcanic eruption, as seen in the 1980 eruption of Mount St.Helens, these earthquakes are caused by the movement of magma, from the earth's crust towards the mouth of the volcano. General earthquakes are caused by movement between two or more tetonic plates rubbing against each other.

with this in mind, we can speculate that the closer the earthquake occurs towards the volcano itself, the more we can speculate possible volcanic activity (should also keep in mind the history of the volcano itself and the last time it erupted and if it is in fact active)

With this in mind, we will look at the month of March 2017, and the distance from the volcanos the earthquakes occurred.

import os
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
os.chdir('C:\Users\jenat\\Documents\\ringoffire')
nv=pd.read_csv("march_near_volc.csv")
nv.columns = ['Volcano', 'Distance']
nv
Volcano Distance
0 Kilauea 3
1 Bardarbunga 5
2 Bardarbunga 7
3 Bardarbunga 10
4 Bardarbunga 7
5 Bardarbunga 7
6 Santo Tomas 10
7 Nisyros 22
8 Tongariro 2
9 Paco 21
10 Kilauea 2
11 Katla 4
12 Katla 3
13 Grímsvötn 13
14 Süphan 20
15 Clear Lake 20
16 Unzen 22
17 Vesuvius 8
18 Askja 23
19 Kilauea 23
20 Panay 7
21 Panay 6
22 Clear Lake 20
23 Akyarlar 13
24 Lassen 2
25 Trident 2
26 Sabancaya 8
27 Ragang 12
28 Long Valley 9
29 Hakkoda 10
... ... ...
366 Etna 14
367 Katla 7
368 Mauna 15
369 Kilauea 23
370 Katla 3
371 Kilauea 18
372 Ruapehu 22
373 Sabancaya 12
374 Kilauea 18
375 Clear 20
376 Hrómundartindur 5
377 Salton 24
378 Salton 23
379 Reykjanes 7
380 Taranaki 22
381 Clear Lake 17
382 Abu 12
383 Bardarbunga 7
384 Bardarbunga 7
385 Bardarbunga 9
386 Bardarbunga 8
387 Bardarbunga 8
388 Reykjanes 13
389 Bardarbunga 10
390 Bardarbunga 7
391 Bardarbunga 8
392 Askja 13
393 Reykjanes 13
394 Baru 6
395 Tjörnes Fracture Zone 16

396 rows × 2 columns

Let's see how many of the earthquakes happened less than 2.0 km from the volcano:

nv.Distance[nv.Distance< 2.0 ].count() 
7

This indicates that 7 earthquakes, so far in March 2017 were less than 2.0 km from a volcano.

More specifically:

nv['Distance'].describe()
count    396.000000
mean      13.212121
std        6.906644
min        0.000000
25%        7.000000
50%       14.000000
75%       20.000000
max       24.000000
Name: Distance, dtype: float64

As we see: out of the 396 earthquakes report for March 2017 that were documented to be near volcanos, the mean was 13.2, and 25% of the earthquakes happened 7 km from a volcano, while 50% of the earthquakes happened 14 km from a volcano, while 75% of the earthquakes happened at least 20 km from a volcano.

We also see that the max was 24 km and the min was 0. The earthquake with 0 km was located in Mammoth Mountain which is in Southern California.

plt.figure()
v_plot = nv['Distance'].hist(bins=20)
v_plot.set_title("Distribution of Earthquakes by their Distances from Volcanos")
v_plot.set_xlabel("Distance from Volcano (km)")
v_plot.set_ylabel("Number of Earthquakes")
plt.show()

png

Since we do not know whether the earthquakes are caused by magma movement or are simply regular earthquakes, we can not say whether these earthquakes are related to volcanic eruptions. However, typically earthquakes before a volcanic eruption happen in many clusters. However, ones that are very close in my mind would raise some speculation.

CONCLUSION

There is not enough scientific evidence, or data to link earthquakes and volcano eruptions as being statistically significant to one another. More specifically, if an Earthquake can cause a volcanic eruption. While scientists are still debating the connection between the two, there is evidence that earthquakes occur (and rather frequently) near volcanos. With that information given, this brings the possibility that it is possible for earthquakes and volcanos to correlate with one another.

Another aspect worth looking into, is determing which earthquake is an aftershock and which earthquake is not.