June 7, 2022

How Do I Know if I Have a Problem That ML Can Solve?

By Kenton Steiner

In the last few years, whether you know it or not, your life has been greatly affected by machine learning (ML). From playing basketball to recommending products on Amazon, machine learning and its many capabilities are being leveraged by more companies than ever before.  

The power of ML to take in vast amounts of data and make educated decisions is opening up possibilities that have not been possible up to this point. But how do you know if machine learning is applicable as a solution to your problem? The technology is incredibly powerful, but determining how to get started can be daunting. 

In this post, we’ll explain what machine learning is, provide you with a series of questions to help you determine if ML is a good solution, give a few examples of common use cases for ML, and considerations as to when to use it.

What is Machine Learning, and Why Does it Matter?

In order to determine if machine learning is the right approach to solving our problem, we must first understand what machine learning is. Machine learning is a subset of artificial intelligence composed of algorithms and mathematical techniques that help computers learn and make decisions the way humans do. It takes data as an input, makes connections, and discovers patterns within the data to make predictions or classifications of future events. 

There are three main types of machine learning: supervised, unsupervised, and reinforcement learning. I will briefly touch on the important points of supervised and unsupervised learning; for a full breakdown of all 3 types, please visit this blog

Each of these types of learning has specific advantages that can assist you in reaching the desired solution for your problem. Supervised learning is a model created to predict a specific outcome when you have a dataset of accurately labeled data for inputs and outputs. 

Unsupervised learning is where the data does not contain hard labels and the model forges connections and associations between data points in datasets typically without human intervention. 

It is important to understand these differences when approaching how to solve your problem so that you can utilize the correct type of ML solution to get to the desired outcome.

How to Determine if a Machine Learning Model will Fit Your Problem

In order to determine which type of ML could be used to solve your problem, we will use this diagram above. For a more in-depth explanation of the diagram, visit this blog

To start, you must first determine what results you are trying to achieve by using your data. The types of ML models used for predictions and classifications greatly differ from those used for understanding the relationships between the data points, so knowing what your end objective is when you begin is key.  

To highlight some of the differences, we’ll walk through a few examples of this diagram for supervised and unsupervised learning and give some use cases for each type of algorithm. Each type of learning is covered more in-depth here.

Supervised Learning: Regression (Real Estate Prices)

For our first example, we will look at predicting the costs of homes in a certain area. To walk through the diagram from above, we are trying to predict a number for our outcome, so we choose the top path for the first two questions. 

Time is not an important factor in our data, so we choose no, which leads us to recommend using a regression algorithm. Regression algorithms are useful when predicting numerical outcomes given a set of inputs, in this case, the price of a home.  

Our dataset could contain many different data points, such as the square footage of the house, the number of bedrooms, bathrooms, year built, zip code, distance from an airport, the average cost of living, etc. This dataset is full of well-labeled variables and data, which is the main requirement for a supervised learning technique.  

After training our algorithm on the given dataset, we will be able to give the algorithm a new set of values, and it will respond with a number that represents the value of that home.

Supervised Learning: Classification (Marketing Ad Success)

Our second example will be looking at a marketing company’s ad success potential. In our above diagram, we want to predict an outcome, so we move to the top of the chart. 

This time, we are looking for a categorical result, so we move down following the “Categories” path. For this specific example, we want to predict yes or no based on whether a person will click on the advertisement. Hence, the diagram recommends a Binary classification algorithm.  

Classification algorithms specialize in taking a set of variables and predicting which category a new set of inputs will fall into.  

For this example, our dataset could consist of rows related to people, containing demographic information like age, occupation, gender, location, and information about their interests based on previous clicks.  

Based on these inputs, the classification algorithm will be trained. When presented with a new ad, the classification algorithm will be able to return a prediction about whether a user will click on that ad or not. 

Unsupervised Learning: Topic Modeling (Customer Reviews)

For our first example of unsupervised learning, we will look at finding themes throughout customer reviews on a business’s website. The data we will be using for this scenario is customer reviews, so large quantities of text. This type of data is considered ‘unlabeled’ because no categories are associated with the text. 

We want to uncover patterns and relationships with this data from our diagram, so we move to the bottom path. For this example, we are trying to find themes throughout the customer reviews to find out what the customers are saying about the business.  

This choice leads us to a topic modeling algorithm specializing in detecting patterns and outputting the results into topics. For our business, one topic could be “Customer Service” if many customers left a review on the level of customer service they received, and we can view all reviews containing that key-word to get a better understanding of how customer service is being perceived and if changes need to be made.  

Unsupervised Learning: Recommendations (Product Recommendations)

For our last example, we will be looking at how companies can make customized product recommendations for customers. The dataset will consist of data collected from browsing history, past purchases, and reviews and ratings left on other products. 

From our diagram, we are looking to obtain relationships between the data points, so we move to the bottom path and get recommendations on which products a customer would like to see. Therefore, a Recommendations algorithm is the best choice. 

This algorithm will compile all of the information gathered on the user and then provide them with recommendations of products while they are on the website matching their perceived interests, in hopes of converting these recommendations into sales and ease of use for the customer.

Next Steps

Once you have selected the algorithm to apply to your data, selecting the right tools to host and implement your solution is critical. The Snowflake Data Cloud provides a scalable solution for easily storing and accessing any format of data, as well as autoscaling of computing power for data preparation and analysis with almost no manual intervention required. Snowflake also has a number of partners and native connections built-in to support your full ML project, such as Dataiku. 

Dataiku is designed to provide advanced analytics and AI at scale, as well as operationalizing ML models. Using Snowflake and Dataiku together allows you to scale data storage, access, preparation, feature engineering, and training and deployment of ML models. You can combine the full power of the Snowflake platform with all the capabilities of Dataiku, running on Snowflake, to create a Machine Learning solution that is easily able to flex and scale to meet your business’s needs.  

Summary

Machine learning is an extremely powerful tool that can solve many problems and provide high value to businesses. Due to the number of tools and techniques that can be leveraged, it can be difficult to determine which type will best suit your needs, but asking a few simple qualifying questions can help to narrow your focus to a manageable number of options.  

While the scenarios listed in this article each focus on one specific type of algorithm, it may be that your problem will require a more complex solution of multiple algorithms working together. In that case, it can still be helpful to walk through the diagram to see what outcomes you are looking for and what types of algorithms will help you get there. 

This article covers some of the most common types of ML solutions, but the constant growth and changing of the field lead to new solutions being developed all the time! Even if your solution isn’t found here, hopefully, these questions will help you narrow your search for your ideal ML solution. For more use cases, please check out our other blog posts

Are you looking to unlock more value from your machine learning initiatives? Contact us today for advice, questions, and strategy

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit