Projects
A selection of my work across data engineering, machine learning, and analytics.
California Early Warning System
Built a school-level Early Warning System (EWS) to identify CA public high schools at risk of low graduation outcomes using only public, non-PII datasets aligned with the ABC framework (Attendance, Behavior, Course performance).
View on GitHub →Flight Delay & Cancellation Prediction (SAN / KSAN)
Predicts flight delays and cancellations leaving out of San Diego International Airport by integrating 2 years of BTS on-time performance data with NOAA weather observations (NCEI ISD) from KSAN.
View on GitHub →School Sentiment NLP
Analyzes how people talk about schools in high-performing vs. low-performing districts using sentiment analysis and topic modeling on Reddit discussions to compare themes and perceptions.
View on GitHub →Cervical Cancer Risk Prediction
Modeled cervical cancer risk using the Cervical Cancer (Risk Factors) dataset (858 records, 36 variables) with mixed binary/categorical/numerical predictors. Compared multiple models and selected the final model based on sensitivity and clinical relevance.
View on GitHub →Seattle Airbnb ETL Pipeline
End-to-end ETL pipeline integrating Seattle Airbnb listings, Seattle weather, and booking trends using MySQL and Jupyter Notebook for efficient reporting and analysis.
View on GitHub →Bike-Sharing Demand Forecasting (Time Series)
Time series forecasting in R to model bike-sharing rental demand for operational planning (redistribution, staffing, maintenance). Includes cleaning, exploratory time series analysis, feature engineering, model building, and forecast evaluation.
View on GitHub →Bank Term Deposit Conversion Prediction
Predicts which customers will subscribe to term deposits to optimize telemarketing efforts (Bank of Portugal dataset from Kaggle). Built Random Forest, Logistic Regression, and KNN; applied SMOTE for class imbalance. Logistic regression chosen for highest recall and balanced performance.
View on GitHub →