August 13, 2020

Understanding Tableau’s IN Function

By Spencer Baucke

On 8/12/2020, Tableau released Tableau Desktop version 2020.3 that included some fun new features, including Write to Database in Tableau Prep, Export to Crosstab Button, and the IN function. There are lots of great new features in this release, but the IN function caught my eye specifically. Let’s dive deeper into understanding the IN functionality and how you can leverage this in your Tableau development.

What does IN do?

Those who are familiar with SQL are already familiar with the IN concept as it’s a common method of defining criteria for a WHERE statement. The IN function in Tableau functions similarly to SQL. See an example SQL query below:

				
					SELECT column_A
FROM table_B
WHERE column_A IN (value1, value2, ...);
				
			

The IN function in Tableau is used to create groupings of different values within a dimension or measure that you specify in the function criteria. The values that you specify in this IN group are essentially creating a permanent Set based upon those criteria.

IN Syntax

The explanation of the syntax to use this operation is shown in bold below.

Returns TRUE if < matches any value in <expr1>.

<expr1> can be a Set, or a lister of literal values or a combined field

Practically, the syntax is going to look like this when assigning values to groups:
				
					CASE [Food Dimension]
WHEN IN('Pizza','Kale','Milk',Zucchini') THEN "Favorite Foods"
WHEN IN('Ice Cream','Cake','Pudding') THEN "Dessert"
END
				
			

You can also define what values you want to show a measure for as well using mostly the same syntax, just with a different ending:

				
					CASE [Finance Dimension]
WHEN IN('Mortage','Car,'Utilities','Insurance') THEN [Expenses Measure]
WHEN IN('Paycheck','Dividend','Rental Income') THEN [Revenue Measure]
END
				
			

In addition to text functions, you can define criteria if they are numeric as well. Using this method you will not have quotations around the values as they are not text values. 

				
					CASE [Year]
WHEN IN(2016) THEN [Total Revenues]
END
				
			

Lastly, you can use parameter values within the IN function as well. Although this may not be a great way to define your criteria, I am simply stating that this is possible. 

				
					CASE [School]
WHEN IN([School Parameter]) THEN [Revenues CY]
END
				
			

Practical Use Case

Whether or not it’s best practice, the most commonly used instance of this new feature will assuredly be creating some sort of criteria based upon a list of text (like below). If your list of text is extremely long, then you could place your items in an Excel sheet and create a formula to put quotes around each word and commas between each word. You could then paste this into your IN syntax. 

				
					\\State Regions = 
CASE [State]
WHEN IN('New York','Massachusetts','Maine') THEN "East"
WHEN IN('California','Oregon') THEN "West"
END
				
			

If your calculations are already structured in this way, this new function might help make some of your old formulas more understandable, both for you and anyone else who might look through your calculations (including other developers making changes). See how much easier it is to create and comprehend the formula with IN rather than the long list of OR statements.

				
					\\Before

IF
[State]='Ohio' OR 
[State]='Iowa' OR
[State]='Indiana' THEN "Midwest"

ELSEIF 

[State]='Georgia' OR
[State]='Alabama' THEN "South"

END
				
			

The calculation using IN is below. Notice how much easier it is to create as well as consume. 

				
					\\After

CASE [State]
WHEN IN('Ohio','Iowa','Indiana') THEN "Midwest"
WHEN IN('Georgia','Alabama') THEN "South"
END
				
			

Admittedly, when I saw the announcement of this function, I wasn’t very excited about it. I thought that everything you could do with IN you could already accomplish using Sets. I still think this is mostly true, but an advantage of instead IN instead of Sets to group data values is that in order for a dimension value to be in a Set, it has to already have occurred in your data. Let me explain what I mean.

As an example, I want to create a Set containing only states within a certain Region. I then want to use these Sets to create calculations. Let’s say I create that set, the calculation to get the region value would look like below.

				
					\\Set Calc = 
IF [Midwest Set] THEN "Midwest"
ELSEIF [East Set] THEN "East"
ELSE "Other Region"
END
				
			

The only issue using this methodology is that only states within your data set would be able to be placed in a set. Let’s say you’ve got sales orders from every state in the Midwest except for Ohio. If you created this Set, you would not be able to have this calculation automatically update once you get a sale from Ohio, BUT, if you used your WHEN IN(‘Ohio’,’Iowa’,’Indiana’,’Michigan’, etc) function it would automatically assign Midwest to your Ohio sales.

I want to caveat these examples by saying that if you’re mapping states to region, stores to group, or whatever your situation may be, the optimal solution would be to get those values in your original data set in the back end. If that’s not possible, a mapping table would also be a more efficient solution than using IN (especially given Tableau’s new Relationships). I want to reiterate that my use cases using IN are just examples and may not offer the most performant results.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit