October 26, 2023

How to Create Custom Categories in Tableau: Engagement Analysis

By Venkatesh lyer

As explained in this topic’s sister blog, effectively splitting data into categories is very important for effective analysis. These custom categories are required to be MECE (Mutually Exclusive and Collectively Exhaustive) in most business scenarios. There is no straightforward way of doing this, especially when there are overlapping conditions.

In this blog, I will show another technique, or rather a process, for creating MECE custom categories in Tableau. I will use the example of a form of dynamic customer engagement analysis I did recently for one of our customers.

Dataset

You can follow along with the Orders table of the latest (2020-2023) Superstore dataset.

Step 1: Understand the Basics

In this engagement analysis, you will learn to split the customer base into these 6 categories.

  • New: Customers who placed their first order in the current month.

  • Active: Customers who ordered in the current and previous month.

  • Dormant: Customers who ordered in the previous but not in the current month.

  • Returned: Customers who ordered in the current but not in the previous month.

  • Inactive: Customers who did not order in both the current and previous months.

As you can see, the conditions in these categories have considerable overlap. A customer ordering in the current month can be in the New, Returned, or Active category.

First, create the following boolean calculation.

Current Month
				
					DATEDIFF('month',[Order Date],{MAX([Order Date])}) < 1
				
			

{MAX([Order Date])} returns the maximum order date in the data source. The DATEDIFF will return the difference in months between the row-level [Order Date] and the maximum order date in the data source. Thus, the difference would be 0 for the current month, and since 0 < 1, you get the correct result.

You can use = 0 instead of < 1 to get the same result. The only reason to use < 1 is because it will be better to understand the tweaks that will be made in Step 2.

To validate, right-click drag the [Order Date] field to the Rows shelf, and double-click on the green Month(Order Date) from the Drop Field options.

A screenshot showing the Drop Field option needed for validating.

Right-click and convert this pill to Discrete. Then drag the [Current Month] field next to it. Scroll down to the bottom of this list, and you will get the following.

A screenshot showing the sheet for validating the current date period calculation.

The calculation works as expected and returns True against the rows for which [Order Date] is in the current month.

Similarly, calculate the previous month using the following calculation.

Previous Month
				
					DATEDIFF('month',[Order Date],{MAX([Order Date])}) < 2
AND
NOT [Current Month]
				
			

To refer to the prior month, you must double the variance. This is why < 2 is used. The AND NOT [Current Month] ensures this calculation does not return True against the current month.

Like with the previous calculation, note that if you want just to get the previous month, the ideal way is to replace < 2 AND NOT [Current Month] with a = 1. Try it out, and you will get the same result. The reason for this circumlocution will be clear in the next step.

Drag this next to the [Current Month] pill that you added before, and scroll to the bottom of the list. You will get the following result.

A screenshot showing the sheet for validating the current and previous date periods.

This is exactly what was needed!

Step 2: Make it Dynamic

In their current form, these two calculations are limited. To power them up, create the following parameter.

A screenshot showing the parameter needed for selecting the date period.

Notice that the Value column entries are in lowercase, whereas the Display As entries are in the title case. This is an application of the technique described in this blog.

Next, rename the [Current Month] field to [Current Date Period] and update it as follows.

Current Date Period
				
					DATEDIFF([Date Period],[Order Date],{MAX([Order Date])}) < 1
				
			

Similarly, update the [Previous Month] as shown below.

Previous Date Period
				
					DATEDIFF([Date Period],[Order Date],{MAX([Order Date])}) < 2
AND
NOT [Current Date Period]
				
			

Change the parameter’s value and observe how the values change for the [Current Date Period] and [Previous Date Period] fields.

The [Current Date Period] and [Previous Date Period] fields always refer to 1 month, 1 quarter, or 1 year. You can make this dynamic by creating another parameter like the one shown below.

A screenshot showing the parameter for selecting the different date ranges

Then update the [Current Date Period] and [Previous Date Period] as shown below.

Current Date Period
				
					DATEDIFF([Date Period],[Order Date],{MAX([Order Date])}) < [Date Range]
				
			
Previous Date Period
				
					DATEDIFF([Date Period],[Order Date],{MAX([Order Date])}) < 2*[Date Range]
AND
NOT [Current Date Period]
				
			

This was the whole reason why < 1 and < 2 were used in the first iteration of these calculations instead of the obvious choices of = 0 and = 1. This makes it very flexible to use in a wide range of scenarios.

Play around with the two parameters and watch how the boolean values change for the two fields.

It is interesting to note that since the maximum date in the superstore dataset is from Dec-2023, the current 1 year, 4 quarters, and 12 months will give the same result. Technically, the calculations are calculated on a rolling basis. 

This means that if the maximum date were Nov-2023, current 1 year, current 4 quarters, and current 12 months, the calculations will return different results for each as expected. If you wish to try this out, you can do it by excluding Dec-2023 Order Dates using a data source or context filter.

Step 3: Create the Customer Segmentation

You will first need a few calculations to segment the customers into the categories mentioned in Step 1.

Current Date Period Customer
				
					{ FIXED [Customer Name]: MAX([Current Date Period])}
				
			

The [Current Date Period] is a boolean field that returns True against the order dates corresponding to the current date period. Getting the maximum of a boolean field returns True if there is at least one row with True. 

Thus, by adding this inside a {FIXED [Customer Name]: MAX()}, you will get True for all records of the customers who ordered in the current date period.

Similarly, create the [Previous Date Period Customer] field using the following calculation.

Previous Date Period Customer
				
					{ FIXED [Customer Name]: MAX([Previous Date Period])}
				
			

In addition, create this calculation to identify the new customers in the current date period.

Current Date Period New Customer
				
					DATEDIFF([Date Period],{ FIXED [Customer Name]: MIN([Order Date])},{MAX([Order Date])}) < [Date Range]
				
			

This is similar to the [Current Date Period] calculation. Note that since the start_date argument is { FIXED [Customer Name]: MIN([Order Date])}, only the first order date for a customer is getting considered.

Also, since it is fixed at the [Customer Name] level, the entire calculation is automatically fixed at the [Customer Name] level. This is why it was unnecessary to fix it at the [Customer Name] level with a MAX aggregation like the two preceding calculations.

To verify if this calculation works as expected, on a new sheet, drag the [Customer Name] field and the [Current Date Period New Customer] field to the Rows shelf. Also, right-click and drag the [Order Date] as MIN(Order Date) next to it.

Convert this to Discrete and Sort Descending. You should see the following when you match the parameter values below.

It works as expected, and with that, you now have all the building blocks for creating the dynamic engagement analysis categories.

Customer Engagement Type
				
					IF [Current Date Period New Customer] THEN "New"
ELSEIF [Current Date Period Customer] AND [Previous Date Period Customer] THEN "Active"
ELSEIF [Current Date Period Customer] THEN "Returned"
ELSEIF [Previous Date Period Customer] THEN "Dormant"
ELSE "Inactive"
END
				
			

Since all three boolean calculations are fixed at the [Customer Name] level, wrapping this calculation inside another LOD expression is unnecessary.  

To validate this, drag it to the Rows shelf of a new sheet, and right-click drag the [Customer Name] field as CNTD(Customer Name) to the Text section of the Marks card. You will get the following.

A screenshot showing the final result of splitting the customer base in engagement cohorts.

Change the [Date Period] to Year and [Date Range] to 4.

A screenshot showing the segmentation cohorts when date period is set to year and date range to 4.

All 800 Customers are now in the New category. This is because the current date period is 4 years, and the superstore dataset also has 4 years of data.

Now, change the [Date Range] to 2.

A screenshot showing the segmentation cohorts when date period is set to year and date range to 2.

Notice how the Returned and Inactive categories have no entries. This is because when the current date period is 2 years, the previous will comprise the remaining two years of the dataset. Thus, there will only be 3 categories.

With this, you now have a dynamic way of slicing your customer base to see how your product/service is performing!

Conclusion

Hopefully, you enjoyed learning this way of splitting data into MECE categories to boost your engagement analysis needs. The relevant performance metrics for each cohort of customers are different, and having the option to view each of them separately is valuable — it is also a surefire way to safeguard your analysis against the Simpson’s paradox — in most business scenarios.

When faced with a complex overlapping condition during categorization, use Fixed Level of Detail expressions (LODs) to resolve the condition into simple conditions at the aggregation level in which you want your final result, and then use AND statements to get the desired combinations. 

However, if your overlapping condition is simple, you can use the technique used for splitting the data from step 1 of this topic’s sister blog centered around RFM Analysis.

If you want more information on Custom Categories in Tableau, contact our team of experts!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit