September 13, 2023

5 Ways to Optimize Your Sigma Computing Calculations

By Katrina Menne

Sigma Computing is a powerful data modeling and analysis platform designed to leverage the power of modern cloud technology. This native integration offers users the fastest and most powerful connection possible. Because the largest impact on workbook performance is going to be the underlying warehouse, we always recommend selecting Snowflake as your warehouse to guarantee the best performance. 

Want to take the guesswork out of optimizing your Snowflake? Reach out to phData to learn more about our Toolkit and free Snowflake optimizer.

Once connected to Snowflake, Sigma utilizes Machine Generated SQL to produce the most optimal results. However, there are a few ways users can help optimize calculations, which will help increase performance and decrease Snowflake consumption costs.

In this blog, we will cover several ways users can optimize their calculations to increase performance, save on consumption costs, and decrease waiting times. This is not intended to be an exhaustive list of SQL best practices but rather provide specific examples of common use cases and SQL best practices within Sigma workbooks.

1. Use table summaries for high-level date calculations

Table summaries are calculations or aggregations applied for all of the data in an element. Think of these as “Grand Totals” or data source-level results. Almost every analysis involves some sort of date comparison and most scenarios compare dates to today or the most recent date available. 

By creating today() or Max([Date Field]) as a table summary, Sigma only needs to compute the answer once, and those results can then be used and reused in other calculations.

A screenshot of Maximize Sigma’s parent-child relationships by calculating date fields as early as possible and then using the results instead of recalculating
Maximize Sigma’s parent-child relationships by calculating date fields as early as possible and then using the results instead of recalculating.

2. Model tables to minimize using CountDistinct

CountDistinct is a resource-intensive calculation identifying unique items in each row of your ever-growing dataset. Sigma’s grouping functionality makes it significantly easier to count distinct lists. All you need to do is group the table by the item you want to count and use “1” as a calculated field. Then, instead of CountDistinct, you can use SUM(), which is less intense.

A screenshot of a more optimized way to calculate the count of unique orders for each Brand
This is a more optimized way to calculate the count of unique orders for each Brand.

Want to learn more about maximizing table aggregations in Sigma? 

Check out this blog to master the fundamentals.

3. Use Booleans whenever possible

A Boolean is a data type that represents binary logic, it most expresses True and False. Because a Boolean field can only be represented as one of two options, they are typically more efficient than other result options. For example, it is faster to return a True/False as a boolean field than it is to return the word “True” or “False” since these are string fields.

Boolean logic is everywhere in analytics and data modeling: Is this date within the current year? True or False. Is this category the one I want to filter to? True or False. Should this customer’s orders be included in the average order value calculation? True or False. Even though asking Sigma to output a Boolean field and then using that field to aggregate or filter on is a small change that probably won’t make much of a difference in an individual element, those incremental changes are compounded as data models become more complex.

A screenshot of Output calculations as Booleans whenever possible in Sigma for optimized warehouse queries.
Output calculations as Booleans whenever possible in Sigma for optimized warehouse queries.

4. Using First or Last Instead of Min or Max

In a similar vein to the Boolean recommendation, use First/Last instead of Min/Max when possible. By asking Sigma to pull in the first or last value row in a column, Sigma does not need to compare every row to every other row to determine which values are the largest or smallest available. Note that this recommendation is only applicable in certain scenarios and heavily depends on the order of your data. 

For example, if I’m calculating the last date available in my dataset, I will almost always use Max([Date]) because I don’t want to be restricted in sorting my date column. The most common scenarios I use First/Last are if a field needs to be treated as an aggregate but is the same value for every row; for example, a quarterly goal compared against monthly sales. 

The second scenario is when I would like a value I’ve already calculated to be displayed in a visual element but not aggregated. For example, ranking the sales team next to their sum of sales (I want to prevent the rank from being summed for each row needed to calculate the sum of sales).

A screenshot of fewer rows of data by utilizing First/Last instead of Min/Max
Ask Sigma to check fewer rows of data by utilizing First/Last instead of Min/Max.

5. Keep Dynamic Logic Simple

When creating advanced and nested calculations, create as many independent steps in the logic as possible. Dividing the logic into smaller chunks lets Sigma’s SQL optimizer run as efficiently as possible, enables parallel execution of logic (you can evaluate True and False results, not just True or False), and helps your workbooks be easier to debug.

A screenshot of not an optimal calculation because it references a parameter selector, a time frame, and an aggregate calculation
This is not an optimal calculation because it references a parameter selector, a time frame, and an aggregate calculation.
A screenshot of using filters and groups by calculations to isolate and calculate what you need
Instead, use filters and groups by calculations to isolate and calculate what you need.

Conclusion

Sigma is incredibly powerful and can handle any analysis scenario you can think of. However, there are still some simple best practice principles developers can follow to help provide the best experience for their end users. The strategies listed here will not only help improve efficiency and save computing costs but will help developers build more robust analytic solutions. 

For further assistance in maximizing your Snowflake and Sigma Computing experience, reach out to phData and explore our advisory services and Toolkit with a free Snowflake optimizer.

FAQs

You can look at the query history view in your workbook to understand which elements are taking longer than others and if that runtime is from Sigma or your warehouse.

A screenshot of how you can figure out which calculations are slowing down my workbook

There are several different ways data can be optimized even before connecting to Sigma. The more optimized your data is in your warehouse, the more optimized it’ll be in Sigma. Snowflake offers several features to help organizations optimize ingestion, storage, and querying.

To better understand how to best model your data in Sigma, check out these additional resources: A Guide to Lookups, Joins, and Unions and 5 Data Modeling Do’s and Don’ts.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit