May 23, 2022

How to Predict Customer Churn with Machine Learning

By Christina Bernard

With the plethora of choices available in today’s marketplace, customer loyalty has been extremely difficult to build. Because of this, some customers will inevitably leave your service. The key question is which customers would have changed their minds if certain action(s) was taken?

Customer churn is defined as the percentage of customers who leave your product or service within a predefined time frame. Understanding why customers churn and which ones are at a high risk of leaving can help your organization be more proactive in retention.

Better customer retention often results in higher realized revenues over time.

Your customer churn can directly affect the revenue lost within a given period. Having a free service that churns monthly is different from having a paid service that churns monthly. A five percent dip in free customers doesn’t necessarily equate to a loss in revenue, but a 10 percent loss in paid customers could represent a 20 percent loss in revenue generated.

In addition to lost revenue, certain customer acquisition costs might not get reimbursed pending on how long the customer stayed.

In this blog post, we will dive into how to create a machine learning customer churn model and how it can impact your retention.

How to Build a Churn Model

The goal of a customer churn model is to evaluate the behaviors and attributes of current and former customers to determine trends that lead to churn. Several behavioral factors that are widely used in these models are customer purchase intervals, cancellations, follow-up calls, emails, and on-page engagement. Customer attributes, like demographics, location, and income, can also be indicators of churn likelihood.

Customer churn models are generally created with a classification algorithm, like logistic regression or decision trees. For these algorithms, all categorical data must be converted into a numerical equivalent using techniques like one-hot-encoding.

This means our predictor column will be turned into 1 if a customer churned and a 0 if they have not. From there, the algorithm will find patterns to determine which combinations of information lead to churn.

Most of the time, churn factors are grouped into either voluntary or involuntary.

Voluntary: when a customer decides to leave a service on their own terms.

  • Switching to a competitor
  • Closing a business venture
  • Negative customer experiences

Involuntary: also called passive or accidental churn. This is when services stop abruptly because a business requirement is not met.

  • Expired credit cards
  • Reaching the limit of available credit
  • Failed mechanical payment processing
  • Lapse in contract
  • Fraud protection on recurring payments

A churn score is essentially a probability value that the algorithm assigns to the customers and is determined by how much of a risk the customer has of churning 

How to Use the Model

Once your model is created, you can integrate it with other technologies, like your customer relationship management system (i.e. Salesforce, Hubspot). Adding a score to each customer profile can help your sales team prioritize certain relationships and proactively reach out to retain them

For example, if your sales representative sees that a customer enters the danger zone, they can call that individual and offer a promotional deal in an attempt to avoid churn.

You can also integrate the model results into your campaign management system. In this situation, you can proactively send high-risk customer coupons or personalized offers in hopes of retaining them. This process can be completely automated, making proactive retention extremely easy.

How to Implement Churn Prediction Using the Snowflake Data Cloud

Snowflake helps data scientists build churn models and provides them with a platform for deploying those models. The data used to train a model may come from a variety of different sources of customer information, such as Hubspot, Salesforce, etc. Snowflake can be used to centralize that data in a common location for model development.  

Once models have been trained, the next step is to create a process that can generate predictions (inference) for new data.  The models can be deployed using Snowpark Python user-defined functions (UDFs) to package the model and run inference workloads on Snowflake compute. Predictions generated in this way can be written into Snowflake tables to make them available downstream. 

The final step in implementing a churn solution is to serve those predictions back to sales representatives or marketing teams. To do that, predictions can be served as a dashboard or application that queries Snowflake. Alternatively, the predictions can be pushed directly back into a CRM system like Hubspot or Salesforce to help teams prioritize their customer interactions.


In a study done by Bain, increasing customer retention by five percent resulted in financial services companies generating over a 25 percent increase in profits. Most of this is due to existing customers buying more products, paying premiums due to service satisfaction, and operating costs dropping. 

This concept is not just applicable to financial services. All industries can see an increase in profits with better customer retention because the cost of a customer relationship is lower while generating a steady stream of revenue. It literally pays to have customer loyalty! 

Interested in reducing customer churn at your organization with machine learning? Contact our ML experts today for questions, guidance, and tips to get started!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit