October 17, 2023

How to Create Custom Categories in Tableau: RFM Analysis

By Venkatesh lyer

All human languages and cultures divide the world into categories. This natural instinct to categorize is instrumental in acquiring and storing knowledge and, thus, to make sense of the world! Almost every product/service out there has a tier system built into it to classify its users/customers. These categories are of paramount importance when it comes to analyzing all other metrics. 

As a Tableau developer, you must have come across a scenario wherein you are required to split your data into custom categories multiple times. To have clear-cut insights, it is often necessary that these categories are Mutually Exclusive and Collectively Exhaustive or MECE. In this blog, I will show some ways to easily split any data into MECE categories using a marketing technique called RFM (recency, frequency, monetary) analysis as an example.

Dataset

You can follow along with the Orders table of the latest (2020-2023) Superstore dataset.

Step 1: Assign the Recency Scores

To show recency, you can use the following scores.

  • 5: Customers who ordered in the most recent month.

  • 4: Customers who ordered in the most recent 3 months.

  • 3: Customers who ordered in the most recent 6 months.

  • 2: Customers who ordered in the most recent 12 months.

  • 1: Customers who ordered before the most recent 12 months.

It is important first to understand why assigning one score to one customer is challenging. For each customer, there are multiple rows corresponding to the orders that they must have placed. Each of these orders will have a specific order date. 

Customers who ordered in the most recent month have also technically ordered in the most recent 3 months, the most recent 6 months, and the most recent 12 months. This is a problem as the customer categories should be mutually exclusive and collectively exhaustive.

To do this, create the following calculation and convert it to a dimension.

Customer Recency
				
					{ FIXED [Customer Name]: MAX(
    IF 
        DATEDIFF('month',[Order Date],{MAX([Order Date])}) = 0 THEN 5
    ELSEIF
        DATEDIFF('month',[Order Date],{MAX([Order Date])}) < 3 THEN 4
    ELSEIF 
        DATEDIFF('month',[Order Date],{MAX([Order Date])}) < 6 THEN 3
    ELSEIF 
        DATEDIFF('month',[Order Date],{MAX([Order Date])}) < 12 THEN 2
    ELSE 
        1 
    END
)}

				
			

{MAX([Order Date])} returns the maximum order date in the data source. Thus, the IF statement will add a number between 1 and 5 based on the DATEDIFF condition against each of the rows.

Wrapping this IF statement in the {FIXED [Customer Name]: MAX()} will return the required result without any overlapping.

Take a moment to think about this. The most recent month is a subset of the most recent 3 months, which in turn is a subset of the most recent 6 months. But since the aggregation type for the Fixed Level of Detail expression (LOD) is MAX, once any customer has ordered in the most recent month and that row corresponding to the customer gets a 5, that customer will get assigned the value 5, even if some rows for that customer get assigned a different number.

This is a very versatile and powerful technique to reduce the number of conditions that need to be evaluated when splitting any data into categories. It is an advanced form of the technique explained in this blog.

To validate this calculation, drag [Customer Recency] to the Filters shelf and select 5. Drag [Customer Name] to the row’s shelf. Right-click drag the [Order Date] as MAX([Order Date]) next to it. Convert this to Discrete. You will get something like this.

An image showing the sheet for verifying whether the Customer Recency calculation is returning the right results.

Sort and scroll through this list. The date values will all fall within the most recent month: Dec-2023. Similarly, explore and validate the other cohorts.

Step 2: Assign the Frequency Scores

For showing frequency, you can use the following scores.

  • 5: Customers who placed 10+ orders.

  • 4: Customers who placed 8 – 10 orders.

  • 3: Customers who placed 6 – 7 orders.

  • 2: Customers who placed 4 – 5 orders.

  • 1: Customers who placed 1 – 3 orders.

To do this, first, create this calculation that returns the orders per customer.

Customer Orders
				
					{ FIXED [Customer Name]: COUNTD([Order ID])}
				
			

Using that, create the following calculation and convert it to a dimension.

Customer Frequency
				
					IF 
    [Customer Orders] > 10 THEN 5
ELSEIF 
    [Customer Orders] > 7 THEN 4
ELSEIF 
    [Customer Orders] > 5 THEN 3
ELSEIF 
    [Customer Orders] > 3 THEN 2
ELSE 
    1
END

				
			

It is interesting to understand why it is not necessary to wrap the IF statement inside another Fixed LOD expression with MAX to avoid double counting the customers. Unlike the previous example, here, [Customer Orders] is already Fixed at the [Customer Name] level. This means the score will also get assigned at the [Customer Name] level.

Also, the order of the IF statements is important to avoid making complicated boolean conditions that drain performance. Generally speaking, you should order the IF statements going from the smallest subset to the largest when dealing with overlapping definitions.

To validate this calculation, drag [Customer Frequency] to the Filters shelf and select 5. Drag [Customer Name] to the row’s shelf. Drag the [Customer Orders] to the text section of the Marks card. You will get something like this.

An image showing the sheet for verifying whether the Customer Frequency calculation is returning the right results.

Sort and scroll through this list. There will only be 10 or more orders. Similarly, explore and validate the other cohorts.

Step 3: Assign the Monetary Value Scores

To show monetary value, you can use quintiles. Again, graded on a scale of 1 to 5.

  • 5: Customers in the fifth interval of the sales per customer list.

  • 4: Customers in the fourth interval of the sales per customer list.

  • 3: Customers in the third interval of the sales per customer list.

  • 2: Customers in the second interval of the sales per customer list.

  • 1: Customers in the first interval of the sales per customer list.

Essentially, the end result should be a dimension that slices the customer base of 800 into 5 equal parts based on sales.

Your first instinct would be to consider table calculations, ranks, or box plots. As you can probably guess, those methods will not be of much use here as none of those options will return a dimension showing quintiles that can be reused.

There is a way by which this can be done. An elegant yet underrated way.

Start by creating the following calculation.

Customer Sales
				
					{ FIXED [Customer Name]: SUM([Sales]) }
				
			

Here’s an interesting tip. If you want to create Fixed LOD expressions that aggregate a measure using SUM at a dimension, there is actually a simple shortcut. Click on the measure you want and Ctrl+drag it in on top of the dimension to which it should be fixed. This will create the LOD expression shown above.

Using that, create the following calculation and convert it to a dimension.

Customer Monetary
				
					IF 
    [Customer Sales] > {PERCENTILE([Customer Sales],0.8)} THEN 5
ELSEIF 
    [Customer Sales] > {PERCENTILE([Customer Sales],0.6)} THEN 4
ELSEIF 
    [Customer Sales] > {PERCENTILE([Customer Sales],0.4)} THEN 3
ELSEIF 
    [Customer Sales] > {PERCENTILE([Customer Sales],0.2)} THEN 2
ELSE 
    1
END

				
			

{PERCENTILE([Customer Sales],0.8)} returns the 80th percentile of [Customer Sales]. Thus, the IF statement will score a 5 against the customers who are in the top 20% by sales (fifth interval) of all the customers. The first ELSEIF will score a 4 against the top 40% by sales, but since the top 20% are already considered, it will return the customers with sales between the 60th and 80th percentile (fourth interval), and so on.

Note that each IF statement condition is essentially a Top N% filter! You can use this in other scenarios to boost your analysis.

To validate this calculation, drag [Customer Name] and [Customer Monetary] to the Rows shelf. Drag the [Customer Sales] to the text section of the Marks card. You will get something like this.

An image showing the sheet for verifying whether the Customer Monetary calculation is returning the right results.

Sort and scroll through the list. You will see that the [Customer Monetary] will be a series of 5, followed by 4, then 3, then 2, and finally 1.

Next, change the [Customer Name] pill to a Measure with Count (Distinct) and convert this to Discrete.

An image showing the number of customers in 5 intervals of the Customer Monetary calculation.

This is the expected result! The customer base is split into 5 intervals of 160 customers each.

Step 4: Create Customer Segmentation

With the individual scores now assigned, the 800 unique customers can be segmented into meaningful categories that are mutually exclusive and collectively exhaustive for doing the RFM analysis.

To start with, create a calculation that gives a combined score.

Customer RFM Score
				
					[Customer Recency] + [Customer Frequency] + [Customer Monetary]
				
			

Using that, create the following calculation.

Customer Segment
				
					IF 
    [Customer RFM Score] > 12 THEN "Champions"
ELSEIF
    [Customer RFM Score] > 9 AND [Customer Monetary] > 3 THEN "Whales"
ELSEIF
    [Customer RFM Score] > 9 AND [Customer Frequency] > 3 THEN "Loyal"
ELSEIF
    [Customer RFM Score] > 9 AND [Customer Recency] > 3 THEN "Promising"
ELSEIF
    [Customer RFM Score] > 7 AND [Customer Recency] > 2 THEN "Need Attention"
ELSEIF
    [Customer Recency] > 2 THEN "New"
ELSEIF
    [Customer RFM Score] > 5 AND [Customer Recency] > 1 THEN "Cannot Lose"
ELSEIF
    [Customer RFM Score] > 3 THEN "At Risk"
ELSE
    "Lost"
END

				
			

Think about these definitions. The order is again very important here. 

On a new sheet, drag this to the Rows shelf, and right-click drag [Customer Name] as CNTD(Customer Name) to the Text section of the Marks card.

An image showing the number of customers in each of the RFM customer segments.

This is just an example of segmenting the customers. Based on your requirements, you can modify the definitions and combinations. Check out this amazing dashboard for another take on RFM analysis.

Conclusion

Hopefully, you enjoyed learning this way of splitting data into categories. The key takeaway is that you can use Fixed LOD expressions to do the heavy lifting in most cases. You can assign values at a row level and assign the maximum of these at any aggregation level. Or you can assign values at a given aggregation level using the rows that roll up to it and then use it to create custom groupings. Or you can split your continuous data into any number of intervals you want using percentiles. Use any of these methods to split your data into the MECE categories and supercharge your dashboards.

If you want more information on Custom Categories in Tableau, contact our team of experts!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit