Reducing Employee Churn with a Data Science Solution

Data science can be applied to many use cases, but one of the most relevant problems facing companies today is reducing employee churn. To address this, our data science team has determined a model that will help predict which employees are at a higher risk of leaving, so companies can take steps to remedy the situation to retain the talent.

Businesses are struggling with employee resignation, especially during this new era of “quiet quitting.” Oftentimes, managers are surprised that individuals wanted to leave in the first place. These resignations aren’t trivial, and they’re costly. Interviewing costs, training costs, and business-specific knowledge are all lost when an employee quits; thus, knowing beforehand who is likely to leave is highly valuable, so action can be taken to retain talent.

Typically, HR data can have insights into who might leave and why, but this is often tedious to parse out as an analyst with large data sources. In contrast to traditional business intelligence methods, machine learning algorithms have the potential to quickly find patterns that are otherwise difficult for humans to discover and encode.

Smartbridge can help organizations respond to the problem of employee churn pre-emptively which can help reduce costs and increase employee retention. We’ve reduced development time by abstracting otherwise manual data science tasks with state-of-the-art machine learning packages, taking your business solutions from idea to proof-of-concept in as little as 4 weeks.

To demonstrate this, we will be delving into a use case that deals with HR data such as employee satisfaction levels, average hours worked, and tenure at the company to come to the conclusion of whether an employee will resign or not.

Industry-Agnostic Applicability

This solution and approach can be used by any business in any industry trying to develop actionable insights from their employee data. This allows companies to prioritize the employees that need attention based on the likelihood of resignation.


When creating this solution, there were a few objectives we wanted to achieve.

  • Develop a machine learning (ML) model which reliably predicts whether employees will resign or won’t resign.

  • Understand the factors that lead to employee resignation via ML model feature importance.

  • Minimize false negatives (such as employees who were predicted to stay but then actually left). These hurt the business more than false positives. We need to do this while also taking into consideration raw accuracy.

Key Challenges

A challenge we placed on ourselves was that a performant machine learning model needed to be developed within 4 weeks. In traditional machine learning model training approaches, multiple rounds of data pre-processing, hyper-parameter tuning, and optimizations are performed somewhat manually to identify the optimal model. This requires time and compute resources that would likely surpass our schedule.

Once the ML model is trained and deployed, it will be run regularly to predict changes in employee classification (will resign or won’t resign). As future data is collected, the model will be retrained regularly to accommodate changes; thus, a scalable machine learning model training and scoring system is needed.

The Smartbridge Solution

As is typical with any use case, Smartbridge starts by understanding the business and their challenges thoroughly to provide a simple, scalable solution for the challenge presented. To solve this particular challenge, we chose a unique machine learning approach to abstract the manual processes of a data science flow and reduce development time while retaining reliability, maintainability, and accuracy.

Choosing an Algorithm (ML Model)

All algorithms relevant to the use case are trained simultaneously with a training dataset then their performance is compared side-by-side with relevant metrics. Ultimately, one or a select combination of guiding metrics will be used to determine the best ML model for the problem. Usually, Smartbridge will work with the business to determine the guiding metric(s). Once settled on the metric(s), a subset of the highest-performing models is selected to progress to the next step.

For our use case, maximizing accuracy while minimizing false negatives, via choosing recall, was the guiding metric combination. Recall measures the model’s ability to detect positive samples. The higher the recall, the more positive samples detected.

The next step is hyperparameter tuning. In hyperparameter tuning, models are efficiently improved from likely sub-optimal to near-optimal by algorithmically searching for the best-performing configuration of the model against the unique dataset. Once optimized, metrics of the subset of models are again compared to one another, often with model-specific visualizations to focus on use case-emphasized factors, such as false negatives in this case. Finally, the optimized model is chosen.


A desired capability of machine learning models is to understand how or why they produce their outputs. Our solution can quickly explain itself by surfacing and visualizing relative feature importance, shown as a percentage of the predictive power of the model. In this use case, it was found that satisfaction level contributed the most predictive power, followed by a given employee’s tenure.

Further, if machine learning explicability is valuable to the business, and the chosen algorithm happens to be a tree-based model then the tree’s logic itself can be visualized. Once visualized, it can be inspected at each node/decision boundary, and assessed by the business, allowing for guiding the final solution.


  • Accuracy is maximized and false negatives are minimized via choosing recall as the guiding metric
  • The ML solution is proven effective with new data via both cross-validation and holdout test dataset

  • Relevant outputs per case along with likelihood of prediction correctness

  • Development time is dramatically reduced compared to traditional ML development approaches, yet outcomes are just as successful

Further Enhancements and Extending the Solution

Employee Retention Triaging

If the outputs of this model were enriched with an employee criticality rank, employees could be:

  • Filtered to those predicted to resign
  • Sorted by likelihood of resignation
  • HR can then take action to retain those employees in this order

    • *Rankings could be derived from data or assigned manually

Integrations with External Data Sources

Integrating with various external data sources can provide new data features that could be added to enrich the classification power of the machine learning model.

This project shows the possibilities of taking the data that is already present across an organization’s enterprise systems and using it to solve problems they may be facing with their workforce or uncover new business opportunities. If you would like to explore this model in your organization, feel free to book a chat with us to see how we can tailor this solution for you.

Looking for more on data science?

Explore more insights and expertise at

There’s more to explore at!

Sign up to be notified when we publish articles, news, videos and more!