How to Create Custom Categories in Tableau: Engagement Analysis

As explained in this topic’s sister blog, effectively splitting data into categories is very important for effective analysis. These custom categories are required to be MECE (Mutually Exclusive and Collectively Exhaustive) in most business scenarios. There is no straightforward way of doing this, especially when there are overlapping conditions.

In this blog, I will show another technique, or rather a process, for creating MECE custom categories in Tableau. I will use the example of a form of dynamic customer engagement analysis I did recently for one of our customers.

Dataset

You can follow along with the Orders table of the latest (2020-2023) Superstore dataset.

Step 1: Understand the Basics

In this engagement analysis, you will learn to split the customer base into these 6 categories.

New: Customers who placed their first order in the current month.
Active: Customers who ordered in the current and previous month.
Dormant: Customers who ordered in the previous but not in the current month.
Returned: Customers who ordered in the current but not in the previous month.
Inactive: Customers who did not order in both the current and previous months.

As you can see, the conditions in these categories have considerable overlap. A customer ordering in the current month can be in the New, Returned, or Active category.

First, create the following boolean calculation.

Current Month

				
					DATEDIFF('month',[Order Date],{MAX([Order Date])}) < 1

{MAX([Order Date])} returns the maximum order date in the data source. The DATEDIFF will return the difference in months between the row-level [Order Date] and the maximum order date in the data source. Thus, the difference would be 0 for the current month, and since 0 < 1, you get the correct result.

You can use = 0 instead of < 1 to get the same result. The only reason to use < 1 is because it will be better to understand the tweaks that will be made in Step 2.

To validate, right-click drag the [Order Date] field to the Rows shelf, and double-click on the green Month(Order Date) from the Drop Field options.

Right-click and convert this pill to Discrete. Then drag the [Current Month] field next to it. Scroll down to the bottom of this list, and you will get the following.

The calculation works as expected and returns True against the rows for which [Order Date] is in the current month.

Similarly, calculate the previous month using the following calculation.

Previous Month

				
					DATEDIFF('month',[Order Date],{MAX([Order Date])}) < 2
AND
NOT [Current Month]

To refer to the prior month, you must double the variance. This is why < 2 is used. The AND NOT [Current Month] ensures this calculation does not return True against the current month.

Like with the previous calculation, note that if you want just to get the previous month, the ideal way is to replace < 2 AND NOT [Current Month] with a = 1. Try it out, and you will get the same result. The reason for this circumlocution will be clear in the next step.

Drag this next to the [Current Month] pill that you added before, and scroll to the bottom of the list. You will get the following result.

This is exactly what was needed!

Step 2: Make it Dynamic

In their current form, these two calculations are limited. To power them up, create the following parameter.

Notice that the Value column entries are in lowercase, whereas the Display As entries are in the title case. This is an application of the technique described in this blog.

Next, rename the [Current Month] field to [Current Date Period] and update it as follows.

Current Date Period

				
					DATEDIFF([Date Period],[Order Date],{MAX([Order Date])}) < 1

Similarly, update the [Previous Month] as shown below.

Previous Date Period

				
					DATEDIFF([Date Period],[Order Date],{MAX([Order Date])}) < 2
AND
NOT [Current Date Period]

Change the parameter’s value and observe how the values change for the [Current Date Period] and [Previous Date Period] fields.

The [Current Date Period] and [Previous Date Period] fields always refer to 1 month, 1 quarter, or 1 year. You can make this dynamic by creating another parameter like the one shown below.

Then update the [Current Date Period] and [Previous Date Period] as shown below.

Current Date Period

				
					DATEDIFF([Date Period],[Order Date],{MAX([Order Date])}) < [Date Range]

Previous Date Period

				
					DATEDIFF([Date Period],[Order Date],{MAX([Order Date])}) < 2*[Date Range]
AND
NOT [Current Date Period]

This was the whole reason why < 1 and < 2 were used in the first iteration of these calculations instead of the obvious choices of = 0 and = 1. This makes it very flexible to use in a wide range of scenarios.

Play around with the two parameters and watch how the boolean values change for the two fields.

It is interesting to note that since the maximum date in the superstore dataset is from Dec-2023, the current 1 year, 4 quarters, and 12 months will give the same result. Technically, the calculations are calculated on a rolling basis.

This means that if the maximum date were Nov-2023, current 1 year, current 4 quarters, and current 12 months, the calculations will return different results for each as expected. If you wish to try this out, you can do it by excluding Dec-2023 Order Dates using a data source or context filter.

Step 3: Create the Customer Segmentation

You will first need a few calculations to segment the customers into the categories mentioned in Step 1.

Current Date Period Customer

				
					{ FIXED [Customer Name]: MAX([Current Date Period])}

The [Current Date Period] is a boolean field that returns True against the order dates corresponding to the current date period. Getting the maximum of a boolean field returns True if there is at least one row with True.

Thus, by adding this inside a {FIXED [Customer Name]: MAX()}, you will get True for all records of the customers who ordered in the current date period.

Similarly, create the [Previous Date Period Customer] field using the following calculation.

Previous Date Period Customer

				
					{ FIXED [Customer Name]: MAX([Previous Date Period])}

In addition, create this calculation to identify the new customers in the current date period.

Current Date Period New Customer

				
					DATEDIFF([Date Period],{ FIXED [Customer Name]: MIN([Order Date])},{MAX([Order Date])}) < [Date Range]

This is similar to the [Current Date Period] calculation. Note that since the start_date argument is { FIXED [Customer Name]: MIN([Order Date])}, only the first order date for a customer is getting considered.

Also, since it is fixed at the [Customer Name] level, the entire calculation is automatically fixed at the [Customer Name] level. This is why it was unnecessary to fix it at the [Customer Name] level with a MAX aggregation like the two preceding calculations.

To verify if this calculation works as expected, on a new sheet, drag the [Customer Name] field and the [Current Date Period New Customer] field to the Rows shelf. Also, right-click and drag the [Order Date] as MIN(Order Date) next to it.

Convert this to Discrete and Sort Descending. You should see the following when you match the parameter values below.

It works as expected, and with that, you now have all the building blocks for creating the dynamic engagement analysis categories.

Customer Engagement Type

				
					IF [Current Date Period New Customer] THEN "New"
ELSEIF [Current Date Period Customer] AND [Previous Date Period Customer] THEN "Active"
ELSEIF [Current Date Period Customer] THEN "Returned"
ELSEIF [Previous Date Period Customer] THEN "Dormant"
ELSE "Inactive"
END

Since all three boolean calculations are fixed at the [Customer Name] level, wrapping this calculation inside another LOD expression is unnecessary.

To validate this, drag it to the Rows shelf of a new sheet, and right-click drag the [Customer Name] field as CNTD(Customer Name) to the Text section of the Marks card. You will get the following.

Change the [Date Period] to Year and [Date Range] to 4.

All 800 Customers are now in the New category. This is because the current date period is 4 years, and the superstore dataset also has 4 years of data.

Now, change the [Date Range] to 2.

Notice how the Returned and Inactive categories have no entries. This is because when the current date period is 2 years, the previous will comprise the remaining two years of the dataset. Thus, there will only be 3 categories.

With this, you now have a dynamic way of slicing your customer base to see how your product/service is performing!

Conclusion

Hopefully, you enjoyed learning this way of splitting data into MECE categories to boost your engagement analysis needs. The relevant performance metrics for each cohort of customers are different, and having the option to view each of them separately is valuable — it is also a surefire way to safeguard your analysis against the Simpson’s paradox — in most business scenarios.

When faced with a complex overlapping condition during categorization, use Fixed Level of Detail expressions (LODs) to resolve the condition into simple conditions at the aggregation level in which you want your final result, and then use AND statements to get the desired combinations.

However, if your overlapping condition is simple, you can use the technique used for splitting the data from step 1 of this topic’s sister blog centered around RFM Analysis.

If you want more information on Custom Categories in Tableau, contact our team of experts!

How to Create Custom Categories in Tableau: Engagement Analysis

Dataset

Step 1: Understand the Basics

Step 2: Make it Dynamic

Step 3: Create the Customer Segmentation

Conclusion

More to explore

How to Migrate Hive Tables From Hadoop Environment to Snowflake Using Spark Job

Beyond the Data: Franco Borgiani, Data Engineer

How to Trigger a Slack Notification When a Pipeline Fails in Fivetran

Join our team

Partners

Resources

Software

Accelerate and automate your data projects with the phData Toolkit

Industries

Solutions

Company

Technology Partners

Other Technology Partners

Check out our latest insights

How to Migrate Hive Tables From Hadoop Environment to Snowflake Using Spark Job

Beyond the Data: Franco Borgiani, Data Engineer

Data Engineering

Consulting, Migrations, Data Pipelines, DataOps

Change Management, Enablement & Learning

COE, Coaching, PMO

Data Science and Machine Learning Services

MLOps Enablement, Prototyping, Model Development and Deployment

Strategy Services

Data, Analytics, and AI Strategy, Architecture and Assessments

Reporting, Analytics, and Visualization Services

Self-Service, Integrated Analytics, Dashboards, Automation

Elastic Operations

Data Platforms, Data Pipelines, and Machine Learning