Predicting Member Churn for Healthcare Insurance

January 20, 2020

Wovenware’s data science team recently began working with a major healthcare provider so we needed to take a closer look at machine learning in healthcare. We wanted to help our customer better predict customer churn and more proactively prevent it. Customer churn is an issue that impacts service providers everywhere, It represents the percentage of customers that stop using a service for one reason or another. Companies are committed to keeping customer churn as low as possible because the cost of acquiring new customers is actually higher than the cost to retain existing customers. They realize that any improvement in customer churn has a big impact on revenue.

Challenges to Addressing Customer Churn

Our healthcare client has a few peculiarities that make it a challenge to keep customer churn in check. Its customers can choose to change their service provider at any time, but it is notified at the end of the month when it’s too late for any remedial action. This limits its ability to identify the customer’s reason for leaving. In addition to that, the nature of the business also limits the value of the data related to customer behavior. Consider this, if a customer is using his health insurance does it means that he’s happy with the service or that he is just sick? On the other hand , what about a customer that hardly ever uses his health insurance, does it means that he is unhappy with the service or that he is just healthy?

Our strategy to leverage machine learning in healthcare to address our client’s customer churn, given the limitations mentioned above, was to build a predictive deep learning model to help it know which customers were at a higher risk of canceling their subscription in the upcoming month. Data that helped build the model included existing customer demographic data and health insurance claims, such as dollar amounts and type of claims. The resulting live predictions would give the provider enough time to contact the high-risk clients and address any need they have before they cancel their membership.

How Did We Address It?

So how did we accomplish this? First, we processed the claims data because it consisted of millions of data points with multiple entries, per day, for each customer. We also consolidated the claims data of each customer by month (since this is the timeframe the client uses to measure customer churn). Then, we analyzed the consolidated data to find patterns that could help us identify valuable features to train the deep learning model. We compared the data points of customers that stopped using the service to data points of customers that continued using the service, and found that all demographic and claims data followed the same distribution, which was a roundabout way to find that we had no meaningful features to train a deep learning model.

Given this setback, we decided to engineer new features by performing arithmetic operations on other claims features, which turned out to be valuable. We also used Pearson Correlation Coefficients to determine the strength of the relationships between features and kept the features with the strongest relationships as the indicators of customer churn.

What we found is that the occurrence of a customer leaving is actually rare, and leads to an unbalanced dataset, which is a problem when training a deep learning model. A model trained with an imbalanced dataset could learn to correctly predict the prevalent case and perform poorly when presented with a rare case, which for us is the case of interest.

Our Approach

The architecture we employed used three fully connected layers, a single neuron at the output, and the sigmoid activation function. We optimized a binary cross entropy loss using a sigmoid  output. A portion of the dataset was used for training and an another portion was used to test the trained model. The portion of the dataset used for testing is called the holdout set. It was especially important to handle the holdout set with care because we wanted to avoid statistical bias on our results.

In the second blog post of this three-part series, read about our approach to model validation with the holdout set.

Tags: ,

Share Share Share Share