What is a Data Join in Sigma Computing?

Sigma Computing is a data analytics platform that brings a spreadsheet-like interface to cloud data warehouses, such as the Snowflake Data Cloud. Sigma takes advantage of Snowflake’s advanced, optimized query engine to ensure your queries run as fast as possible on millions, hundreds of millions, or even billions of rows of data. 

If you are familiar with SQL, you may wince at the thought of joining data sources; however, Sigma makes it incredibly easy with a user-friendly interface instead of writing SQL statements. In this blog, I will go over what a join is, how to do it in Sigma, and a few examples of how you can join data in Sigma.

What is a Join, and Why Does it Matter?

Joining data is a powerful concept when modeling your data sources for a production-ready environment. Say you have two tables: one has information about your customer, and the other has order-specific information down to the order number. If you want to analyze both of these data from two tables, you’ll need to perform a join. 

You’ll need a join key, which will allow your two tables to be connected. If each of the two tables has a customer ID, you can use that as a join key. Your join will match the join key on the customer table to the join key on the order table. That way, you can pull in columns from both tables into one.

There are plenty of resources available to go over different join types in detail, but the following joins are available in Sigma:

  • Left outer join
  • Inner join
  • Right outer join
  • Full outer join

The join type that you choose will have implications on how many rows your final data set will be. In this example, say you only want to keep the rows where there is a customer ID match versus keeping all rows even if there is no customer ID match. You will use different join types as well as have a different number of rows. We will go over this example later in the blog.

How to Join Data in Sigma

Navigating to Joins

Once you are logged into your Sigma environment, it is very easy to join data. 

First, you will need to be a Creator or Admin to join data sets. You can take a look at an overview of what different account types can do here. Next, click the “Add Element” plus sign in the top left corner and select “Table”; ultimately, this join will be presented as a table. From here, select “Join” (at the time of publishing this blog, the join feature is in beta).

navigating to join under select source in Sigma Computing

Once you have selected the join option, you will be presented with a screen to select your data sources.

Joining Data

When actually joining data, you will see you have three options to select your data source: new, in use, and page elements. New data sources are new ones that you will bring in, in use are data sources that you are already using in the workbook. Page elements are tables that you have brought into a workbook that include calculations or manipulations.

In this example, we will select a new data source, and use the Plugs Electronics sample data set. I select the Sigma Sample Database in Snowflake, navigate to PLUGS_ELECTRONICS, and select PLUGS_ELECTRONICS_HANDS_ON_LAB_DATA. 

Once selected, you will see the column information, the data in the table, as well as table information, including the number of rows and columns.

data source in the form of a join table in Sigma Computing

You can bring in as many or as few columns as you like, but in this example, I will select all of the columns and click “Select” to select this as my first table.

The next screen is your data canvas, and you will be able to see what tables are present in your data model. I will repeat the above steps but click the small plus sign next to “Sources” to add the Plugs Electronics Customer table, PLUGS_ELECTRONICS.D_CUSTOMER. Now that I have brought in my second table, my screen shows the overview of the join.

data canvas in Sigma Computing

You can see the following on this screen:

  • Join with: This shows which tables I am joining. When you have numerous tables in your data model, you can keep track of them here.
  • Join type: This is where you can select what type of join you want to perform. I will go over two examples below.
  • Join keys: This is where you select your join key. You can have multiple keys, denoted by the “Add keys” plus sign. In this example, I have one key. Cust Key.

In the screenshot above, you can see that Cust Key is my join key, so all of my rows, if they share a join key, will join on this value. An important distinction between a look up and a join is that a join can return multiple values per key. This makes sense because if I am a customer with customer ID #0731, I can have multiple transactions, thus multiple orders. This is in contrast to a lookup, where you only return one value. Check out our comparison of look ups vs. joins here.

Finally, you can see the key count and keys with or without matches. We have 4,872 keys in the hands-on lab data and 4,972 in the customer table. This means there are 100 more customers in D_CUSTOMER than in our table on the left, PLUGS_ELECTRONICS_HANDS_ON_LAB_DATA. This is evident in the “Keys with no matches” section, with D_CUSTOMER having 100 keys with no matches.

Join Types

As outlined above, we can use several different join types in Sigma. I will compare two to show the implications of your join type. With a left join, I am selecting all rows that my left table has matches. In the first example above, we are only pulling in customers with Cust Keys that exist in PLUGS_ELECTRONICS_HANDS_ON_LAB_DATA. Since there are 100 keys that do not exist in PLUGS_ELECTRONICS_HANDS_ON_LAB_DATA, these customers will not be included in our data set. This is a left join.

On the other hand, if we choose a right join, we will bring in the Cust Keys that do not exist in the left table. Since D_CUSTOMER is on the right, we are bringing in all Cust Keys on the right. This will include the 100 keys with no matches since we specified a right join. We will still bring in all rows on the left table, PLUGS_ELECTRONICS_HANDS_ON_LAB_DATA, but we will also bring in the Cust Keys that were omitted in the left join. In this example, this is a right join.

Closing

Now that you have learned about joins and performed a join, simply click “Preview Output” in the top right corner to preview your new joined data.

newly joined data in Sigma

You can then click done, bringing your newly joined data into Sigma, where you can perform analysis on the table. You can create child elements from this newly joined table, like pivot tables or data visualizations, and bring your Snowflake data to life.

Joins are fairly straightforward in Sigma, as long as you know where to look. With this blog, you will be equipped to understand what joins are, how to do them in Sigma, and how to build child elements off of your newly joined data.

Data modeling can be difficult for those newer in their analytics journey. If you have questions on how Sigma can help your organization, we are here for you with our team of expert consultants. Reach out to us today!

FAQs

Great question. Lookups are typically for ad hoc analysis and will return one answer per row; joins are for more productionalized data sets and can return multiple results per row. Be sure to check out our lookups vs. joins blog that covers this exact question.

 

Not all account types in Sigma can do joins and model data as outlined above. You must be a Creator or have Admin rights to be able to perform joins and create a production-ready data set.

Accelerate and automate your data projects with the phData Toolkit

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.