This project applies machine learning models to predict earthquake magnitudes and occurrences using real seismic data from 1900 to 2023. The dataset, provided by the National Earthquake Information Center (NEIC), includes information on magnitude, depth, and location. The goal was to help anticipate high-risk zones and improve preparedness.
The map displays global earthquake distribution : most notably along tectonic plate boundaries such as the Pacific Ring of Fire and the Mediterranean region. These insights helped us understand the spatial concentration of seismic activity.
We tested multiple algorithms to predict whether an earthquake's magnitude would exceed 6.0 on the Richter scale. Models included Logistic Regression, CatBoost combined with Random Forest achieved the best performance with ~75% accuracy.
The chart below compares the accuracy of different machine learning models tested on the earthquake dataset.
Each acronym represents a model or an improvement technique:
• RF: Random Forest — a model that combines multiple decision trees.
• XGB: XGBoost — an optimized gradient boosting model.
• CB: CatBoost — a boosting model that handles categorical data efficiently.
• Grid Search: A method used to find the best model parameters.
• Threshold: Adjusting prediction sensitivity to balance accuracy.
• Stacked Model: A combination of the best models for stronger predictions.
The results show that XGB + Grid Search and the Stacked Model performed the best,
achieving an accuracy of around 76%, which indicates reliable performance in identifying
patterns related to significant earthquakes.
By analyzing over a century of earthquake data, we demonstrated that machine learning can help identify risk zones and support disaster prevention strategies. Although prediction precision remains limited, these models contribute to understanding seismic trends and improving preparedness for future events.