The release of Tableau 2020.2 has a lot of cool new features, but perhaps none are more game-changing than the implementation of relationships into the data model. There have been many conversations throughout the BI community about why Tableau is moving in this direction and why they removed the beloved Dimension and Measure panes. As someone who has seen how powerful this approach can be from seeing it work wonders in Power BI, I hope this blog will give you a sense of why this is a great addition to Tableau and how it can help its users.
What is a Relationship?
In this sense of the word, it is not the terrible thing that you have with your in-laws. Here, it is when one data table has a field (often called the foreign key) that relates to a field (called the primary key) in another data table in a relational database. When tables have these relationships, they can be stored separately in smaller, more concentrated tables with the ability to be linked together by their relational fields.
Data doesn’t always come in a nice, curated, relational database. The good news is that you will still be able to ingest flat files into Tableau and create all the joins your heart desires, but now you’ll have the ability to create a more efficient data model using the STAR schema concept. What is a STAR schema? I’m glad you asked!
Understanding STAR Schema
This method of storing related tables separately is called a STAR schema. In a STAR schema data is stored in multiple tables that are related through primary and foreign keys in DIM and FACT tables. For all intents and purposes, a DIM table is a table in your relational database that contains a primary key and then attributes about that primary key. A FACT table contains foreign keys relating to the different DIM tables as well as measurements and aggregations. Please see the example below.
In this example we have two DIM tables (region_dim and store_dim) and one FACT table (sales_fact). Each of the first two columns in the FACT table have relationships with the DIM tables on the left. These fields are how we can link the tables together. The store_id in the first column of sales_fact are the same stores as the store_id field in store_dim.
As you can see here, storing these three tables separately allows better visibility into each of the tables. If I wanted to see details specifically for each Region, then I could view the region_dim table and find out what I needed. Storing data this way is not only easier to understand, but as you’ll see in our next section, it is much easier for Tableau to maximize performance of their BI product.
How Queries Work in a STAR Schema
Utilizing a STAR schema approach allows for greater data efficiency and performance. Let me show you exactly how the SQL would work using the STAR schema and flat file approaches. When I say flat file approach I am referring to when all of your data tables are joined into one data set instead of breaking out the tables and storing them separately.
Let’s start with our data from the previous example. Here, we have three tables: two DIM and one FACT table. Our use case is that we want to find the total sales by Region.
Let’s start by examining the query that will be necessary to complete the request if we did not have the ability to use data relationships and a STAR schema. Using the flat file approach, we need to get these tables joined together to create one data set in order to perform the aggregations needed. Here is the query needed to complete this.
Flat File Query:
Select a.store_id, a.region_id, a.total_sales, b.store_id, b.store_name, b.store_type, c.region_id, c.region_name, c.region_leader FROM a.sales_fact LEFT JOIN b.store_dim ON a.store_id=b.store_id LEFT JOIN c.region_dim ON a.region_id=c.region_id
Once this query has been created, every aggregation of this data will derive from this query. For example, if you want to create a bar chart of the sales by region, then you would need to add a GROUP BY region_id clause after this SQL statement. This means we’re querying 3 tables, 9 fields, 2 joins, and 1 group by statement. Let’s compare this to the query using a relationship instead of a flat field.
When using a data relationship, the query will be in two steps instead of one. See the relational data query below:
Query 1 Select c.region_id, c.region_name, c.region_leader FROM c.region_dim
Query 2 Select SUM(a.total_sales), c.region_id FROM a.sales_fact LEFT JOIN QUERY 1 on c.region_id = a.region_id
Part 1 of this query pulls the fields from the DIM table that are at the granularity we want to aggregate by. Part 2 of this query creates an aggregation using the fact table field total_sales. These two queries are joined together on region_id so that the sum of regional sales is reflected in query 1 where the region details reside. In this query we’re querying 2 tables, 5 fields, 1 join, and 0 group by statements because we’re utilizing a relationship with the dim table.
As you can see, our query for this relational aggregation required far less query bandwidth than it did for the flat file approach, and this was just for 3 tiny tables. Now take this concept and scale that by the amount of data that you might have in an enterprise data set. You will find that this approach decreases query time, increases overall performance, and allows for better interpretation of your data.
Every time you create a bar chart in Tableau you should think about how they are created. Your dimension is simply a Group By clause in the SQL in visual form. By eliminating a ton of the SQL needed to create your bar charts you end up with a more efficient data model for your dashboard.
Although a flat file can be more comfortable to work with, especially for Tableau users used to working with those formats, I would encourage you to give the data relationships a chance because I think you’ll be pleased with the results!
Do you have more questions about Tableau? Talk to our expert consultants today and have all your questions answered!