Each time you turn on the TV or browse the web, it’s likely you’ll see an ad from someone touting how their product or service utilizes cutting-edge analytics. Every day you hear buzzwords like “Artificial Intelligence,” “Machine Learning,” and “Algorithm.”
Organizations are quickly adopting predictive and prescriptive solutions to solve problems, but it can seem intimidating to figure out where to start.
In this blog, I’ll help you figure out if seemingly abstract concepts like “Supervised Learning” or “Clustering” can be applied to your business problems by asking a series of straightforward questions.
Can Machine Learning Work for my Organization?
The chart below depicts a high-level question flow that can help guide what algorithm might be right for your problem. For the remainder of this article, I’ll refer to this flow chart to help guide us in finding a machine learning (ML) solution for your problem.
Defining Your Problem
The first step to understanding if your business problem can be solved with a machine learning solution is to ask “What am I trying to get from my data?” If your answer is that you want to predict something (e.g., finding how many shipments are going to be late next month or which customers are going to churn), we’ll navigate through the top half of our flow chart and explore techniques known as supervised learning.
If the answer is that you want to uncover patterns and associations within your data without making a prediction (e.g., finding groups of stores that behave similarly or understanding purchasing behavior of customers that buy product A and their likelihood to buy product B.) we’ll dive into techniques referred to as unsupervised learning. While I’ll cover basic definitions, a full explainer on supervised vs. unsupervised can be found in this blog.
Predicting Outcomes: Supervised Learning
If your ultimate goal is to predict a given outcome, your solution will come from the realm of supervised learning. Supervised learning is a methodology in data science that creates a model to predict an outcome based on labeled (historical) data.
Let’s use the example of predicting dollar sales at a store for the next month. In order to do this, you will have to use historical sales data for that particular store. The historical sales amount would serve as the “label” for the prior month and store it is associated with.
If you decide that you are trying to predict an outcome and you have historical data with previous outcomes, then the next question to ask yourself is “Is the outcome I am trying to predict a number or a category?” In our sales example, the outcome is a number.
If your problem was the customer churn example from the previous section, your outcome would be a category since each customer would be classified as either “likely to churn” or “not likely to churn” or potentially one of many categories like:
- High churn risk and high customer value
- High churn risk and low customer value
- Low churn risk and high customer value
- Low churn risk and low customer value
If your answer to the question was “number” then the section titled Regression is for you. If your answer was “category” then the Classification section is what you need for your business problem.
Predicting Numbers: Regression
The algorithms used to predict labeled, numeric outcomes are referred to as regressors. Regression algorithms determine a relationship between a dependent variable (the outcome we are trying to predict, often referred to as Y) and independent variables (the data we are using to make our prediction, often referred to as X).
The next question you need to ask yourself is “Does the outcome I’m trying to predict depend on time.”
If what you are trying to predict is affected by the time of the day, the day of the week, or the month of the year then it is said to be seasonal. An example of data that is highly seasonal would be visitors to the NFL section of a sports website. Site traffic would likely depend on the time of day, the day of the week (peak on Sundays), and the time of the year (higher traffic in months with football than in months without).
If your answer to “is time important” is “no” the next section for you is Non-Time Based Regression. If you answered “yes” your business could benefit from Time Series Analysis.
Non-Time Based Regression
You’ve determined you are trying to predict a numeric outcome from labeled data that is not time-dependent. Example problems could be predicting the lifetime value of a new customer using known information about the customer and historical purchase patterns of other customers, estimating the value of real estate using characteristics of the property and historic comparable sales, or predicting the number of insurance claims a customer will make using historical claims data.
The following is a non-exhaustive list of algorithms or approaches that could be used to solve your problem:
- Multivariate Linear Regression
- Support Vector Regression
- Random Forest Regression
- XGBoost Regression
- Artificial Neural Networks
Common business problems regression analysis might be applicable to include:
- Multi-touch attribution
- Trade promotion optimization
- Risk management and underwriting
Time Series Analysis
Your problem involves predicting a numeric outcome that is time-dependent. Example problems include predicting the number of future unscheduled absences at a production facility to make sure staffing needs are met, forecasting monthly sales of a product for the next six months, or forecasting the future prices of raw materials used to produce a product.
All of the above techniques from the non-time based regression section could still potentially be utilized, but you may also solve your problem with approaches listed below:
- Moving Averages
- Facebook Prophet
- Recurrent Neural Networks
- Google’s Temporal Fusion Transformer
Common business problems time series analysis might be applicable to include:
- Demand planning
- Revenue forecasting
- Predictive maintenance
- Staffing optimization
Predicting Categories: Classification
You’ve established that you are trying to predict an outcome with labeled data and the outcome you want to predict is a category. You have a classification problem. Like regression algorithms, the algorithms used in classification problems use input independent variables (X) to predict a dependent outcome variable (Y), but in this case, Y represents a category rather than a number.
The next thing to ask yourself is how many potential categories are there? Take an example where you are trying to improve the engagement rate on an email campaign. You use customer data and historical email campaigns’ click-through information to identify which customers to target. In this case, the labels for the historical data are whether or not the customer clicked on the previous campaign.
The model will be sorting customers for the future campaign into two categories, likely to engage and not likely to engage. In that example, you are performing binary classification. In an example where you are performing Facial Expression Recognition, you’ll likely have many categories (i.e., angry, happy, sad, fearful, neutral, etc.). Since there are more than two categories, this falls under the umbrella of multi-class classification.
You’ve determined that you are trying to predict one of two categories using labeled data. Common examples include identifying if a transaction is likely to be fraudulent or not, predicting if a customer will churn, or determining if a link in an email is safe or malicious.
The following machine learning approaches could be used to help solve your problem:
- Logistic Regression
- Support Vector Machine
- Naive Bayes
- XGBoost Classifier
- Random Forest Classifier
- Artificial Neural Networks
Common business problems binary classification might be applicable to include:
- Churn prediction (customer or employee attrition)
- Lead scoring (propensity to buy)
- Worklist prioritization
You’ve determined that you are trying to predict one of many potential categories using labeled data. The approach to solving your problem is through multi-class classification. Examples might be predicting which of several products a customer is most likely to purchase next, classifying a customer review as positive, negative, or neutral, or identifying whether a photo of a product on a store shelf is product A, product B, product C, etc.
The algorithms used in multi-class classification are the same as binary classification, but you may hear the following terminology used that you would not hear with binary classification:
- Multiclass Cross-Entropy
Common business problems multi-class classification might be applicable to include:
- Image recognition
- Sentiment analysis
- Business process automation (automated decisioning)
If your answer to the question “What am I trying to get from my data?” was “I want to understand more about the patterns or relationships in my data,” then approaches known as unsupervised learning may be right for your problem. Unsupervised learning is a technique that determines patterns and associations in unlabeled data.
In other words, it uncovers relationships and categories that we may not know or understand yet. For example, you may have data about different store locations like store size, area demographics, and past sales performance and want to understand if there are similar segments of stores.
In this example, the data is considered unlabelled because it is unknown whether a specific store falls into segment 1, segment 2, etc. The segment is what we are trying to learn. Compare this to a classic supervised learning example of predicting customer churn. In that case, we look at historical customer data and know which group a customer fell into (churned or didn’t churn) so that data is said to be labeled.
Once you have determined that your data is unlabeled, the next step is to ask yourself a series of questions. Would my problem benefit from finding smaller groups with similar characteristics? Am I looking to find common themes within written documents? Do I want to recommend items for specific users? Am I wanting to find associations between items?”
If you answered “smaller groups”, clustering may be right for your problem. If you are looking for themes in text, topic modeling is for you. If making recommendations to users would benefit your business, a recommendation engine may be for you. If you answered “associations between items”, read the section on association rules mining.
Clustering involves sorting data into groups whose members have similarities and are dissimilar from members of other groups. Our example of segmenting stores from the previous section would fall under the umbrella of clustering.
Another common example of clustering is to identify groups of customers with similar traits, interests, and likelihood to purchase. Marketing can then be personalized or tailored for each cluster of customers in an effort to maximize return on marketing spend.
Common approaches and algorithms used in clustering problems are:
- Centroid-based Clustering (K-means, K-medians, K-medoids)
- Density-based Clustering (DBSCAN)
- Hierarchical or Agglomerative Clustering
- Gaussian Mixture Models
Common business problems clustering might be applicable to include:
- Customer segmentation
- Outlet segmentation
- Customer cohort identification
If the above section on clustering did not sound like the solution for your problem, the closely related technique of topic modeling might be useful, especially if you are dealing with large amounts of text data. Some business use cases for topic modeling might be fitting ads to websites by understanding the themes of the sites or understanding the different themes of Yelp reviews about your company.
While the previously discussed clustering methods are designed to partition data into similar groups, topic modeling is meant to uncover latent themes or topics. For example, you could use clustering to split 1,000 articles into smaller groups of similar articles. Using topic modeling we would uncover co-occurring words or phrases within the articles to identify themes or topics and represent each article as a collection of themes.
The words or phrases “Super Bowl”, “Olympics”, “ball”, “game”, and “Lebron” might all be grouped together under a topic we’d call “Sports”, while “Oscars”, “run-time” and “director” might be grouped together under a topic we call “Film”, and “Congress”, “bill”, and “filibuster” might be grouped together in a topic we call “Politics.”
Each of the 1,000 articles can then be represented by how much of the article is composed of each theme or topic we’ve just uncovered, so article #1 might be 80 percent sports and 20 percent politics, while article #2 might be represented as 75 percent politics and 25 percent film.
Common terminology you might hear with topic modeling would include:
- Latent Dirichlet Allocation
- Document-Term Matrix
- Term Frequency-Inverse Document Frequency
Common business problems topic modeling might be applicable to include:
- Identifying themes in customer or employee feedback
- Tagging records for assignment to the correct teams
- Categorization of documents
An additional application of unsupervised learning is a recommendation engine. If your problem could benefit from recommending a product, service, film, TV show, or song to a customer, a recommendation engine might be for you.
Recommendation engines usually fall under one of two categories, content-based or collaborative filtering. Sometimes, it can even be a hybrid of both. Content-based engines operate by focusing on the similarity of items or content. For example, if our user Bob likes both Happy Gilmore and Big Daddy, we could infer that Bob will like The Waterboy due to its similarity.
Collaborative filtering is based on past user ratings. If another user, Bill, liked the same 1990s slapstick comedies as Bob but also liked Nomadland, a collaborative filtering-based recommendation may recommend Nomadland for Bob because Bob and Bill have similar rating histories.
On the other hand, a content-based system would be unlikely to recommend that film since a 2020 drama based on a nonfiction book differs greatly from the movies Adam Sandler was making in the 90s.
Additional terminology you may hear used with recommendation engines includes but is not limited to:
- User matrix
- Item matrix
- Cosine similarity
- Pearson correlation
- Matrix factorization
Common business problems recommendation engines might be applicable to include:
- Resume matching
- Product recommendations
Association rules mining is used to find correlations (or co-occurrences) in data. The most common examples are in retail where the approach can be used to determine the association between different item combinations. For example, how does the existence of diapers in a shopping cart increase the likelihood of the cart also containing beer? This approach is applicable to any problem that can be formulated as an “if-then” statement.
For example, if you are looking to optimize website experience, exploring the association between different customer paths and what page the customer ended their session on. In this example, one of the many “if-then” statements that could be investigated is “IF customer clicked the banner offer and visited the product details page, THEN customer checked out successfully.” Association rules mining could determine how positively or negatively clicking the banner and viewing the product details page is associated with successfully checking out relative to other customer click patterns.
Some association rules mining algorithms are:
- F-P Growth
Common business problems association rules might be applicable for include:
- Market basket analysis
- Web page personalization
The sheer amount of terminology and techniques associated with machine learning can make it difficult to know whether or not there is an ML approach for your business problem. Asking a series of simple questions can help you determine whether or not your problem can be formulated in a way where machine learning can solve it.
It’s also possible that your problem could use a combination of the approaches discussed. For complex problems, it is common for multiple machine learning algorithms to build on one another. If you don’t think any of the approaches described in this article are applicable, do not be discouraged! While I covered the most common types of machine learning problems, the list is not exhaustive and the field is advancing every day.