← All projects
← Back to all projects
Financial Risk / ML · 2024
Bankruptcy Prediction Ensemble
Stacked gradient-boosting model for financial distress
Python 3.12LightGBMXGBoostCatBoostscikit-learnPandasNumPyPlotly
Problem
Predict whether a company will go bankrupt from 64 anonymized financial ratios. High-stakes binary classification where false negatives are expensive (missed defaults) and false positives are also expensive (over-conservative lending).
Approach
- Built a missing-value-aware pipeline with binary null flags, median imputation, and IQR-based robust scaling.
- Added 5 PCA components and 10 interaction terms to capture non-linear feature relationships.
- Bayesian-tuned three base learners (LightGBM, XGBoost, CatBoost) with 5-fold stratified CV.
- Stacked the out-of-fold predictions through a regularized Logistic Regression meta-learner (C=0.1) to prevent leakage.
Insights
- Stacked ensemble (0.9056 AUC) beat the best single base learner (LightGBM 0.9042). Small gain, but real.
- Engineered missingness flags landed in the top features. Informative missingness mattered more than I expected.
- Robust scaling outperformed standard scaling on this dataset's heavy-tailed ratios.
Impact
Showed that ensemble stacking adds value even when the base learners are already strong, and that careful feature engineering (especially around missingness) often beats marginal model tuning.
← Back to all projects