The Problem
A debt-settlement program works only if clients stay in it long enough to settle their debts. Clients who leave early cost the firm and rarely help themselves. The client wanted to know which clients were likely to leave, early enough to do something about it.
The data made that hard. For many former clients the record showed that they had left but not whether they had graduated or dropped out. Any model had to be built and validated on incomplete history, and it had to be honest about where that history was thin.
The Results
The engagement delivered a predictive attrition model that flags at-risk clients early and ships a plain-language reason with every score.
Logistic regression was chosen over more complex alternatives because it performed well and could be explained to the people who had to act on it. The model drew on several years of client data and dozens of variables, combining the firm's own records with external demographic data. It was validated on held-out and out-of-time samples, and its accuracy improved as the prediction window lengthened. The work later became an ongoing engagement: rollout across the firm's clients, scheduled retraining, and monitoring for model drift.
Key Techniques
- Operational definition of the target variable before any modeling began.
- Interpretable model selection, with a reason code attached to every prediction.
- Feature engineering across internal records and external demographic data.
- Train, test, and out-of-time validation, plus a prediction-window sensitivity check.
- Plain statements of the data's limits, rather than claims the data could not support.