February 29, 2024

Optimizing Custom SQL for Tableau

By Sanam Zahedi

Have you ever encountered a project that requires you to join and query several tables to feed into a dashboard, but due to various limitations (i.e., database permissions, ETL capability, processing, etc.), it has to be done using custom SQL in Tableau? 

Hopefully, you don’t run into this scenario because joining and querying multiple tables in Tableau using custom SQL is not recommended due to its impact on performance. Still, if you ever do, I’ve compiled some tips and tricks to help you effectively build and implement your custom SQL in Tableau with minimal impact on performance and processing.

In this blog, we will cover effective strategies for optimizing Custom SQL queries in Tableau to enhance your performance and streamline data visualization. These strategies will range from ways to reduce the size of your data, to your query structure. In each section, you will find examples that demonstrate how to implement each strategy and drive the desired outcome. Hopefully these tips and examples will give you a kick start in your SQL optimization journey.

Tip 1: Only Bring in the Data you Need

An excellent first step to optimizing your query is removing the data you don’t need. 

Refine what you’re bringing in, starting with the FROM clause; the FROM clause is the first part of the executed query, as per the standard SQL execution order. This means that if JOINS are present in your query, the FROM clause will be the first part read.

So let’s use an example: say your goal is to join the tables order and detail from a database called db, and you’re using the field sku to join the two tables. While the detail table below shows 9 columns, this table has 100 columns total, with the visual-only showing the first 9 columns. 

Here are some example tables:

order
order_number sku date
3287493 1234 1/13/2023
2846287 4444 3/10/2023
1047263 5454 4/18/2023
detail (+91 columns not shown):
sku class sub_class vendor state region type model_year rate
1234 Power tool pt1 Dotten Maine Northern power 2023 $544
4444 Hand tool ht2 Stanley Utah Western tool 2022 $130
5454 Power tool pt2 Press Florida Southern power 2022 $470
6711 Ice maker ap1 Scottie Nevada Western appliance 2023 $255

The first query below will bring in both full tables and join them on the sku field. This means all 100 columns from the detail table will be included in this join, making it a heavy lift. The second query will bring in only the fields specified after the SELECT clause, thus significantly reducing the number of columns brought in by the JOIN. 

Optimizing your query by only bringing in the necessary columns, mainly when your subquery allows you to considerably reduce the number of columns, will result in significant performance improvement.

Query 1
				
					SELECT *
FROM db.order
JOIN db.detail 
    ON order.sku = detail.sku

				
			
Query 2
				
					SELECT *
FROM db.order
JOIN (
SELECT 
    sku, 
    class,
    vendor, 
    rate 
FROM db.detail
) 
    ON order.sku = detail.sku

				
			

Tip 2: Apply Filters or Joins to Your Tables

Adding simple filters to your query is an easy way to reduce table size. Depending on the desired output, many filters can do the trick. You may only want your table to include data within a specific date range or associated with a particular category; using filters allows you to render only the data you need.

Where applicable, you can also write the filters in your query to be dynamic. This gives your SQL query more flexibility and general application. For example, if you want to isolate all of the rows from the order table where the date was in the last week, you could use the filter below:

				
					SELECT * 
FROM db.order 
WHERE date >= DATEADD(GETDATE(), -7, date)
				
			

Tip 3: Use Key Values

An effective way to organize your data is by using primary keys to build a foundational table to which you can join values. To do this, you first need to determine the level that you want your table. 

Once you’ve identified your key column(s), you can create a table of unique rows based on the values of one or more primary columns; these keys can be configured as a single column of unique values or as unique value groupings of multiple columns. Creating keys allows for faster query execution and prevents duplication.

The example below creates a keys index using the unique sku and vendor combinations from  the tables detail and vendor. The keys are compiled using the SELECT DISTINCT and UNION clause, and the additional fields are joined back to the keys through the JOIN clauses at the bottom of the query. See example:

vendor
sku vendor channel code
1234 Dotten store 14
4444 Stanley online 5
5454 Press online 14
6711 Scottie store 18
detail (+91 columns not shown):
sku class sub_class vendor state region type model_year rate
1234 Power tool pt1 Dotten Maine Northern power 2023 $544
4444 Hand tool ht2 Stanley Utah Western tool 2022 $130
5454 Power tool pt2 Press Florida Southern power 2022 $470
6711 Ice maker ap1 Scottie Nevada Western appliance 2023 $255
Query 3
				
					SELECT 
    keys.sku,
    keys.vendor,
    detail.class,
    detail.rate, 
    vendor.channel,
    vendor.code
FROM (
    SELECT DISTINCT 
        sku, 
        vendor 
    FROM db.detail
    UNION
    SELECT DISTINCT 
        sku, 
        vendor 
    FROM db.vendor
) keys
LEFT JOIN db.detail 
    ON keys.sku = detail.sku AND keys.vendor = detail.vendor
LEFT JOIN db.vendor 
    ON keys.sku = vendor.sku AND keys.vendor = vendor.vendor;

				
			

This example created a foundational table of keys within the query using the sku and vendor columns. Then, the additional columns from the db.detail and db.vendor tables are joined against it to create this refined output table:

sku vendor class rate channel code
1234 Dotten Power tool $544 store 14
4444 Stanley Hand tool $130 online 5
5454 Press Power tool $470 online 14
6711 Scottie Ice maker $255 store 18

Tip 4: Reduce Subqueries

Subqueries are simply queries that are nested in the main query. They are an effective way to refine or build on your data, but they can sometimes be challenging to understand and may not be the most efficient option. 

CTEs are essentially subqueries defined at the front of the query by a WITH clause. CTEs are temporary result sets stored in memory and do not require disk access; compared to subqueries, they are cleaner, more easily retrievable, and can improve query performance. 

You can work with CTEs like a table in your query: select from them, join them, etc. If your query is heavy on subqueries and you’re using the same data multiple times, consider using CTEs to clean your query and reduce workload.

Below is an example of Query 2 above, written to use a CTE:

Query 2 using a CTE
				
					WITH Sales_Detail AS (
SELECT
sku,
class,
vendor,
description,
rate
FROM db.detail
)

SELECT *
FROM db.order
JOIN Sales_Detail
ON order.sku = Sales_Detail.sku

				
			

Using CTEs in Tableau can be a little tricky, but here’s a guide that shows you how to use them.

Tip 5: Use an IDE for Build/Test

While you’re building your SQL query, you want to be able to quickly test and iterate to ensure you’re getting the desired results. Re-running your query repeatedly in Tableau can take a long time, especially if your query is doing a lot of work. 

Download an IDE and connect to your database so you can build and test your query seamlessly and efficiently. This way, you can see the results and make adjustments much faster before transferring your query to Tableau. Two great IDE options are DataGrip and Visual Studio Code.

Conclusion

Optimizing your SQL query in Tableau is a great way to ensure that, despite limitations, performance remains high. Many of these tips are applicable outside of Tableau and are standard guidelines for SQL optimization. As you can see, small changes can significantly impact performance; the key is knowing the best strategy to use for each situation and what to avoid.

If you want more information on Optimizing Custom SQL for Tableau, contact our team of experts!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit