July 14, 2023

How to Speed up the Performance of Count Distinct Calculations in Tableau

By Venkatesh lyer

You have probably come across the tip that if you want to make your calculations fast, avoid using count-distinct calculations. But since it has a specific use case that is hard to replicate, it is not easy to save your dashboards from the performance-sucking fangs of Count Distinct. 

That said, if you are the type of person who likes to have the cake and eat it too, you have come to the right place.

In this blog, you will learn an alternative to the count distinct calculations that are not only more performant but also has some added advantages that will take your Tableau calculations game to the next level.

Dataset

As usual, you can use the Orders table of the latest (2020-2023) Superstore dataset to follow along. No data source tinkering is needed for this.

Scenario 1: Simple Count Distinct Calculation

You might have come across the idea of getting the total row count in the data by creating a calculated field with just 1 written in it. This technique is loosely based on that but leverages the power of Level of Detail Expressions (specifically Fixed LODs) to replicate what count distinct does. For example, if you want to get the distinct count of orders, you can create the following calculation.

				
					//All Orders
{ FIXED [Order ID]: MAX(1)}
				
			

Even though this will technically add 1 against every Order ID and thus all the rows, when it is added in a view, it will only consider each Order ID once because of the Fixed LOD.

To quickly test this out, create a table by dragging the Sub-Category in Rows, Right Click, dragging the Order ID, and dropping it as CNTD(Order ID) in Text.

Now drag the All Orders calculation on top of the Order values, as shown below.

It returns the same exact values as the count distinct! Notice the aggregation type of “All Orders” is SUM. When you are in a situation where you want to switch your measures using parameters, having SUM instead of count distinct simplifies a lot of the calculations.

Scenario 2: Conditional Count Distinct Calculations

Traditionally, if you wanted to get the distinct count of Order IDs when the Order ID ends with a 9, you would do something like this.

				
					//Traditional Conditional Count Distinct
COUNTD(IF RIGHT([Order ID],1)="9" THEN [Order ID] END)
				
			

But with the technique that you just learned, you can make this much more efficient by creating a calculation like shown below.

				
					//Orders ending with 9
{ FIXED [Order ID]: MAX(INT(RIGHT([Order ID],1)="9"))}
				
			

Wrapping a boolean inside an INT returns 1 when True and 0 when False. This calculation will essentially return a 1 for all the Order ID that ends with 9.

This is what you will get when you drag the two measures in your view.

It matches perfectly with the “Traditional Conditional Count Distinct” calculation values. 

The magic of the Fixed LOD is that even though there are multiple rows against each Order ID, each 1 will get counted only once for each Order ID. To understand this more clearly, create a new calculation, as shown below.

				
					//Orders ending with 9 X
INT(RIGHT([Order ID],1)="9")
				
			

Adding that in the view will result in the following.

Notice how the values are different. This is because the “Orders ending with 9 X” calculation is summing up all the records which match the criteria, i.e., it is similar to the CNT function and returns the sum of all the rows with orders whose Order ID ends with a 9.

Scenario 3: Filtering

Another really cool advantage of this technique is that you can directly use the calculation as a filter in a sheet if you want. Add the “Orders ending with 9” to the filters as shown below.

This is what you will get.

The first two columns show the same values as the next two, which is expected because of the filter. But the more interesting thing is that the “Orders ending with 9 X” measure has the same values as before! In the previous scenario, Tableau did not sum all the rows that met the condition, but in this scenario, when the same thing was added in filters, Tableau showed all the rows! This is another one of the have-the-cake-and-eat-it-too moments.

In addition to this, you may be wondering whether you now have to add all your filters in context since it is clearly mentioned in the Tableau Order of Operations that Fixed LOD expressions operate before the Dimension filters.

Try doing that by adding a Category filter with Category set to Office Supplies, as shown below.

It works without even adding it in context! What if you add a filter for State = New York instead?

That works too!! Since the LOD expression is fixed at the Order ID level, which is the lowest level of aggregation needed in the view. Any dimension above that level of aggregation doesn’t need to be added in context to work. 

To understand this intuitively, imagine if you created a calculation to get the distinct count of States. Filtering out cities will not affect the count as long as there is at least one city of that state not filtered out. But filtering out a country will remove all states from the country from view.

Scenario 4: Nested Conditions

Imagine a case in which you want to get the distinct count of customers who have at least 3 orders whose Order ID ends with 9. You can easily do it with the technique you just learned by creating a nested calculation, as shown below. Note that the Customer ID field is hidden by default.

				
					//Customers with at least 3 Orders ending with 9
{ FIXED [Customer ID]: MAX(INT({ FIXED [Customer ID]: SUM([Orders ending with 9])}>=3))}
				
			

The inner Fixed LOD will give the sum of the “Orders ending with 9” for each Customer. If this calculation is >= 3, the INT function will return a 1. This then gets fixed at the Customer ID level, like in Scenario 2. 

To check if the calculation will actually return the right values, add Customer ID, “Orders ending with 9,” and “Customers with at least 3 Orders ending with 9” in the view as shown below. Sort Descending on the “Orders ending with 9” to make it easier to validate.

It works exactly as expected and returns a 1 for the customers for whom the sum of the “Orders ending with a 9” is greater than or equal to 3. 

The advantage of this method is that you are flagging the entire Customer ID based on a condition that is true for only a few rows with that Customer ID. This is essentially similar to some results you get by performing a self-join/using subqueries in SQL.

Scenario 5: Funnel Analysis

If you want to build an Orders funnel that goes something like this All > Consumer Segment > Consumer Segment + Standard Class Shipment > Consumer Segment + Standard Class Shipment + Ordered on Sunday, you can use the technique as shown in the steps below.

				
					//Consumer Orders
{ FIXED [Order ID]: MAX(INT([Segment]="Consumer"))}
				
			
				
					//Consumer | Standard Class Orders
{ FIXED [Order ID]: MAX(INT([Consumer Orders] = 1 AND [Ship Mode] = "Standard Class"))}
				
			

Note how you can utilize the technique used in the filtering scenario to simplify the boolean condition.

				
					//Consumer | Standard Class | Sunday Orders
{ FIXED [Order ID]: MAX(INT([Consumer | Standard Class Orders] = 1 AND ISOWEEKDAY([Order Date]) = 7))}

				
			

Add the 3 new measures and the All Orders measure in the view along with the Segment, Ship Mode, and Weekday of Order Date in Rows as shown.

It works! Now, remove the Dimensions from Rows and add Measure Values in Rows. You will get this.

There you have it! A funnel analysis showing distinct orders without using count distinct!

Conclusion

Hopefully, you enjoyed learning this versatile, scalable, and performant alternative for the dreaded count of distinct calculations. Use the scenarios from the blog as a foundation to come up with use cases specifically tailored to your needs!

If you want more information on calculations in Tableau, contact our team of experts!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit