Earthquake Prediction¶
Earthquakes are unpredictable natural disasters driven by complex geological processes. While some regions experience frequent seismic activity, predicting the exact time and location remains challenging. This project explores the use of machine learning to analyze historical earthquake data, aiming to identify patterns that could improve prediction. Although precise forecasting is still uncertain, machine learning offers a promising approach to better understand and prepare for seismic events.
Phase 1: data Collection and Preparation¶
Step 1: Retrieve data from the Significant Earthquake dataset (1900-2023)¶
# Importing necessary libraries
import numpy as np
import pandas as pd
from google.colab import files
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, classification_report, confusion_matrix
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.utils import resample
import matplotlib.pyplot as plt
import folium
import seaborn as sns
import os
from xgboost import XGBClassifier
# This loop walks through the directory "/kaggle/input" and prints the full path of each file found
for dirname, _, filenames in os.walk("/kaggle/input"):
for filename in filenames:
print(os.path.join(dirname, filename)) # Join and print the full file path
# Read the data from csv and load the data into a Pandas DataFrame
#uploaded = files.upload()
#data = pd.read_csv(next(iter(uploaded.keys())))
data = pd.read_csv("/content/sample_data/Significant Earthquake Dataset 1900-2023.csv", sep=',')
data.head()
| Time | Place | Latitude | Longitude | Depth | Mag | MagType | nst | gap | dmin | ... | Updated | Unnamed: 14 | Type | horizontalError | depthError | magError | magNst | status | locationSource | magSource | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2023-02-17T09:37:34.868Z | 130 km SW of Tual, Indonesia | -6.5986 | 132.0763 | 38.615 | 6.1 | mww | 119.0 | 51.0 | 2.988 | ... | 2023-02-17T17:58:24.040Z | NaN | earthquake | 6.41 | 5.595 | 0.065 | 23.0 | reviewed | us | us |
| 1 | 2023-02-16T05:37:05.138Z | 7 km SW of Port-Olry, Vanuatu | -15.0912 | 167.0294 | 36.029 | 5.6 | mww | 81.0 | 26.0 | 0.392 | ... | 2023-02-17T05:41:32.448Z | NaN | earthquake | 5.99 | 6.080 | 0.073 | 18.0 | reviewed | us | us |
| 2 | 2023-02-15T18:10:10.060Z | Masbate region, Philippines | 12.3238 | 123.8662 | 20.088 | 6.1 | mww | 148.0 | 47.0 | 5.487 | ... | 2023-02-16T20:12:32.595Z | NaN | earthquake | 8.61 | 4.399 | 0.037 | 71.0 | reviewed | us | us |
| 3 | 2023-02-15T06:38:09.034Z | 54 km WNW of Otaki, New Zealand | -40.5465 | 174.5709 | 74.320 | 5.7 | mww | 81.0 | 40.0 | 0.768 | ... | 2023-02-16T06:42:09.738Z | NaN | earthquake | 3.68 | 4.922 | 0.065 | 23.0 | reviewed | us | us |
| 4 | 2023-02-14T13:16:51.072Z | 2 km NW of Lele?ti, Romania | 45.1126 | 23.1781 | 10.000 | 5.6 | mww | 132.0 | 28.0 | 1.197 | ... | 2023-02-17T09:15:18.586Z | NaN | earthquake | 4.85 | 1.794 | 0.032 | 95.0 | reviewed | us | us |
5 rows × 23 columns
We had some issue to import the dataset directly from Kaggle, so we ended up to directly import in the colab sample_data file. It is necessary to reload it everytime we are reopening the notebook.
print("Name of each column")
data.columns
Name of each column
Index(['Time', 'Place', 'Latitude', 'Longitude', 'Depth', 'Mag', 'MagType',
'nst', 'gap', 'dmin', 'rms', 'net', 'ID', 'Updated', 'Unnamed: 14',
'Type', 'horizontalError', 'depthError', 'magError', 'magNst', 'status',
'locationSource', 'magSource'],
dtype='object')
print("Data types of each column")
data.dtypes
Data types of each column
| 0 | |
|---|---|
| Time | object |
| Place | object |
| Latitude | float64 |
| Longitude | float64 |
| Depth | float64 |
| Mag | float64 |
| MagType | object |
| nst | float64 |
| gap | float64 |
| dmin | float64 |
| rms | float64 |
| net | object |
| ID | object |
| Updated | object |
| Unnamed: 14 | float64 |
| Type | object |
| horizontalError | float64 |
| depthError | float64 |
| magError | float64 |
| magNst | float64 |
| status | object |
| locationSource | object |
| magSource | object |
We can see that we only have two types of variables: float64 and object. The objects reprensents in fact strings. In the next part, we are going to make some vizualizations of our data.
Step 2: Visualization¶
Here, all the earthquakes from the database are visualized on a world map, providing a clear representation of the locations with higher earthquake frequency.
m = folium.Map(location=[0, 0], zoom_start=2)
for _, row in data.iterrows():
folium.CircleMarker(location=[row["Latitude"], row["Longitude"]],
radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=0.5).add_to(m)
m.save("earthquake_map.html")
m