Harshil Panchal
← All projects

Financial Risk / ML · 2024

Bankruptcy Prediction Ensemble

Stacked gradient-boosting model for financial distress

Python 3.12LightGBMXGBoostCatBoostscikit-learnPandasNumPyPlotly

Problem

Predict whether a company will go bankrupt from 64 anonymized financial ratios. High-stakes binary classification where false negatives are expensive (missed defaults) and false positives are also expensive (over-conservative lending).

Approach

  • Built a missing-value-aware pipeline with binary null flags, median imputation, and IQR-based robust scaling.
  • Added 5 PCA components and 10 interaction terms to capture non-linear feature relationships.
  • Bayesian-tuned three base learners (LightGBM, XGBoost, CatBoost) with 5-fold stratified CV.
  • Stacked the out-of-fold predictions through a regularized Logistic Regression meta-learner (C=0.1) to prevent leakage.

Insights

  • Stacked ensemble (0.9056 AUC) beat the best single base learner (LightGBM 0.9042). Small gain, but real.
  • Engineered missingness flags landed in the top features. Informative missingness mattered more than I expected.
  • Robust scaling outperformed standard scaling on this dataset's heavy-tailed ratios.

Impact

Showed that ensemble stacking adds value even when the base learners are already strong, and that careful feature engineering (especially around missingness) often beats marginal model tuning.


← Back to all projects