Projects

Systems I'm shipping at work, capstones from SavvyCoders and TripleTen, and the full 16-project data science portfolio. Each card is honest about what it tried to do, what came out, and the stack that got it there.

Systems I'm building Day & Night Solar

The internal operating system behind a 170-project, 15-state commercial solar portfolio. Each one replaces a spreadsheet, a folder of PDFs, or a manual QuickBooks workflow.

Bank Reconciliation Engine

Progressive fuzzy-match engine across 5 bank accounts in a high-volume, multi-account environment. Tier-priority confidence scoring with review queues for ambiguous matches — surfaces discrepancies before month-end, not after.

Result Discrepancies caught mid-cycle vs. at close
Pythonpandasrule-based matching

QuickBooks Automation

Programmatic IIF generation for customers, POs, invoices, and deposits with anti-duplicate safety checks against tens of thousands of historical transactions.

Result Eliminated 90%+ of manual QB entry
PythonQuickBooks EnterpriseIIF

Executive Dashboards (DNS)

pandas + openpyxl pipeline covering P&L, cash flow, billing, margin, and schedule. Auto-regenerates; static HTML operations hubs with collapsible sections and clickable file links.

Result CFO and leadership read the same data
PythonpandasopenpyxlHTML

Document Intelligence

OCR + classification + auto-routing for hundreds of incoming vendor docs, bank statements, and internal cost models. Outlook .msg / .eml forensics pipeline builds defensive timelines for dispute preparation.

Result Hundreds of documents auto-classified and routed
EasyOCRPyPDF2Python

Solar Compliance Reference System

Working reference system for tracking federal incentive eligibility, manufacturer sourcing rules, and domestic content thresholds across product categories. Built to help a commercial solar team make faster, more defensible procurement decisions.

Result Faster, defensible procurement decisions
ExcelPythonResearch Synthesis

Capstones & standouts

No-Whey

Personal Project

A dairy-free ingredient checker built as a Progressive Web App. Lets you scan or type ingredients to spot dairy hiding in product labels — with a flashlight for reading small text, confetti for clean hits, and PWA install.

Result Shipped and live
PWAJavaScriptREST APIRender

Age Detection

TripleTen Program

Computer-vision regression model for alcohol-sales compliance — estimates age from a photograph using a CNN trained on a labeled image dataset. Built for the supermarket chain Good Seed to help prevent sales to underage customers.

Result Regression CNN trained end-to-end
PythonTensorFlowKerasCNNPIL

jamesnguyen.netlify.app

Personal Site

This site — an Astro static build deployed on Netlify. A place to write, log what I'm learning, and ship small web work.

Result Shipped and live
AstroTypeScriptHTMLCSSNetlify

TripleTen Data Science — 16 projects

The end-to-end DS portfolio: churn, pricing, forecasting, NLP, computer vision, and classical analytics. Full code and notebooks in the TripleTenProgram repo.

Customer Retention Prediction (Telecom)

Predicted contract cancellations so marketing could reach at-risk customers before they churned.

AUC-ROC ≥ 0.75

Random Forest · Logistic Regression · CNN · TensorFlow

Car Price Prediction

Gradient-boosted model estimating used-car market value with a focus on prediction speed as well as accuracy.

Fast tuned GBM beat baseline linear

CatBoost · LightGBM · XGBoost · scikit-learn

Movie Review Sentiment

NLP pipeline flagging negative movie reviews for sentiment tracking.

Multi-model NLP with transformer features

NLTK · spaCy · transformers · LightGBM

Taxi Orders Forecasting

Hourly forecast of taxi demand to help optimize fleet scheduling.

RMSE ≤ 48 on hourly forecast

statsmodels · scikit-learn · LightGBM

Cell Phone Plan Recommender

Classification model that picks the best cell plan for a customer from usage patterns.

Accuracy ≥ 0.75

scikit-learn · Decision Tree · Random Forest · Logistic Regression

Bank Client Retention

Predicted which bank customers were likely to leave so retention could be targeted.

F1 ≥ 0.59

scikit-learn · Random Forest · Logistic Regression

Credit Scoring Analysis

Evaluated borrower metrics to predict the likelihood of loan default.

Identified drivers of default risk

Python · pandas

Oil Well Location Selection

Validated reserve volume models and calculated profit/risk trade-offs across candidate drilling regions.

Best risk-adjusted region selected

scikit-learn · Linear Regression · Bootstrap

Gold Recovery Modeling

Modeled the gold-mining recovery process to surface efficiency improvements.

Prototype model for industrial use

scikit-learn · Linear Regression · Decision Tree · Random Forest

Insurance Benefit Modeling

Identified similar customers and predicted insurance benefit amounts while preserving data privacy.

Privacy-aware similarity + benefit model

scikit-learn · KNN · seaborn

Vehicle Price Analysis

Investigated which features drive used-vehicle prices in classified ads.

Feature-importance report for pricing

pandas · NumPy · Matplotlib

Video Game Sales Hypothesis Testing

Tested hypotheses on user vs. critic scores to pick promising platforms and ad directions.

Statistically validated recommendations

pandas · SciPy · Matplotlib

Taxi Trips vs. Weather

Tested whether weather conditions meaningfully shift taxi trip duration.

Hypothesis test on real trip data

pandas · SciPy

Cell Plan Revenue Analysis

Analyzed client behavior across telecom packages to identify which plans generate the most revenue.

Revenue-driver segmentation

pandas · SciPy · Matplotlib

Music Preferences Across Cities

Compared listening habits between two cities to surface patterns in music preference.

First end-to-end EDA project

Python · pandas

Learning Coach Effectiveness Analysis

End-to-end DS workflow — cleaning, modeling, evaluation — focused on performance metrics and outcome prediction.

Full workflow, performance metrics reported

Python · scikit-learn · pandas