The end-to-end DS portfolio: churn, pricing, forecasting, NLP, computer vision, and classical
analytics. Full code and notebooks in the TripleTenProgram repo.
Customer Retention Prediction (Telecom)
Predicted contract cancellations so marketing could reach at-risk customers before they churned.
AUC-ROC ≥ 0.75
Random Forest · Logistic Regression · CNN · TensorFlow
Car Price Prediction
Gradient-boosted model estimating used-car market value with a focus on prediction speed as well as accuracy.
Fast tuned GBM beat baseline linear
CatBoost · LightGBM · XGBoost · scikit-learn
Movie Review Sentiment
NLP pipeline flagging negative movie reviews for sentiment tracking.
Multi-model NLP with transformer features
NLTK · spaCy · transformers · LightGBM
Taxi Orders Forecasting
Hourly forecast of taxi demand to help optimize fleet scheduling.
RMSE ≤ 48 on hourly forecast
statsmodels · scikit-learn · LightGBM
Cell Phone Plan Recommender
Classification model that picks the best cell plan for a customer from usage patterns.
Accuracy ≥ 0.75
scikit-learn · Decision Tree · Random Forest · Logistic Regression
Bank Client Retention
Predicted which bank customers were likely to leave so retention could be targeted.
F1 ≥ 0.59
scikit-learn · Random Forest · Logistic Regression
Credit Scoring Analysis
Evaluated borrower metrics to predict the likelihood of loan default.
Identified drivers of default risk
Python · pandas
Oil Well Location Selection
Validated reserve volume models and calculated profit/risk trade-offs across candidate drilling regions.
Best risk-adjusted region selected
scikit-learn · Linear Regression · Bootstrap
Gold Recovery Modeling
Modeled the gold-mining recovery process to surface efficiency improvements.
Prototype model for industrial use
scikit-learn · Linear Regression · Decision Tree · Random Forest
Insurance Benefit Modeling
Identified similar customers and predicted insurance benefit amounts while preserving data privacy.
Privacy-aware similarity + benefit model
scikit-learn · KNN · seaborn
Vehicle Price Analysis
Investigated which features drive used-vehicle prices in classified ads.
Feature-importance report for pricing
pandas · NumPy · Matplotlib
Video Game Sales Hypothesis Testing
Tested hypotheses on user vs. critic scores to pick promising platforms and ad directions.
Statistically validated recommendations
pandas · SciPy · Matplotlib
Taxi Trips vs. Weather
Tested whether weather conditions meaningfully shift taxi trip duration.
Hypothesis test on real trip data
pandas · SciPy
Cell Plan Revenue Analysis
Analyzed client behavior across telecom packages to identify which plans generate the most revenue.
Revenue-driver segmentation
pandas · SciPy · Matplotlib
Music Preferences Across Cities
Compared listening habits between two cities to surface patterns in music preference.
First end-to-end EDA project
Python · pandas
Learning Coach Effectiveness Analysis
End-to-end DS workflow — cleaning, modeling, evaluation — focused on performance metrics and outcome prediction.
Full workflow, performance metrics reported
Python · scikit-learn · pandas