Throughout my analytics journey, I’ve encountered all sorts of data models, from simple to incredibly complex. I’ve also helped everyone, from data newbies and data experts, implement a wide range of solutions in Sigma Computing.
In this blog post, we will delve into some essential do’s and don’ts of data modeling in Sigma Computing. This list has been accumulated from my extensive experience, mistakes made, and lessons learned working with thousands of workbooks and collaborating with numerous organizations.
It’s important to note that while these concepts provide a solid foundation for data modeling, they may not be universally applicable to your data.
This list is not meant to be exhaustive but aims to equip you with valuable insights to help you make informed decisions when building efficient and flexible data models in Sigma. By embracing these practices, you’ll be well on your way to optimizing your data modeling process in Sigma.
1) Add “Pit Stops” or Extra Child Tables
Adding intermediary tables in your data modeling process can significantly enhance flexibility and data manipulation capabilities. “Pit Stop” tables are child tables after data sources (Database, CSVs, or Input Tables) or data modeling steps (Joins, Unions, Groupings).
These tables act as intermediate steps or elements, allowing you to modify and manipulate data as needed. Because Sigma is a live connection to your data source, especially when connected to Snowflake Data Cloud, you don’t need to worry about having too many data connections. Incorporating pit stop tables will allow for greater flexibility in modeling and avoid potential limitations in future analysis.
- Enhanced flexibility for modeling and data changes.
- Avoid potential limitations in future analysis.
The blue box demonstrates including a “Pit Stop” table between the source and data modeling (join). As opposed to the red circle, which jumps straight to modeling and would limit interactivity with the date prior to modeling.
2) Start at the Bottom
Sigma wants you to think of your data and analysis as a row-level analysis that is grouped or aggregated up. This contrasts with other analytics platforms that want you to think of your data as high-level or grand totals and use their data modeling options to drill down into more granular analysis.
Adopting a row-level analysis approach and gradually building up through groupings will help streamline your data modeling process by allowing for greater flexibility and interactivity with the data. It is better to let Sigma handle aggregations from the row level than trying to model each level of aggregation every time.
- Easier analysis and calculations when starting with the most detailed level.
- Increased flexibility in analyzing data across different groupings.
When aggregating, start with row-level aggregations to provide the most flexibility and insight into the analysis.
3) Avoid Modeling in Pivot Tables
While pivot tables can help display data, it’s generally recommended to avoid using them for data modeling purposes. In Pivot Tables’ underlying data, Sigma performs group-by operations for each combination of row and column.
Since these table aggregations are not editable, you’re unable to change the group-by logic. Instead, it is advisable to model data in parent elements and utilize pivot tables for presentation and visualization.
The main exception to this rule is Percent of Total calculations when you want the calculation to be dynamic and reflect what levels are collapsed or expanded in the Pivot table.
- Improved performance by avoiding complex group-by operations.
- Greater flexibility in modifying and updating analysis.
To learn more about how Sigma uses Table Aggregations, check out this video: Episode 23: Sigma Fundamentals: Mastering Aggregations.
This is the underlying view of a Pivot Table. Note the order of the groupings is rows, then columns. This order can not be changed to column then row.
4) Plan Ahead for Materialization
Leveraging Sigma in workbook materialization can significantly enhance performance by writing results to your database. These materializations act like “copy and paste results,” so Sigma doesn’t have to rerun calculations every time something changes.
However, it’s essential to strategically plan the placement of filters and parameter calculations to maximize materialization’s benefits. It’s best to model your data for all results and filter on a child materialized element.
If you target a filter or parameter before a materialization, Sigma will rerun the materialization each time a filter or parameter option changes resulting in increased loading time.
- Improved performance through workbook materialization.
- Faster load time for end users.
5) Minimize Lookups
Lookups in Sigma are a great option that helps users create quick connections between elements.
However, excessive use of lookups can impact performance, especially in complex data models. Lookups require repeated calculations, which can be computationally intensive. Consider minimizing lookups and opting for joins, which perform faster and have the added benefit of enabling bidirectional data flow.
- Better performance by reducing redundant calculations.
- Simplified data flow and understanding.
- Increased control and customization with joins.
This blog post has explored some essential do’s and don’ts of data modeling in Sigma. While these concepts do not cover every possible scenario, these guidelines serve as the foundation for constructing highly efficient data models.
Sigma offers an incredibly flexible toolkit to investigate and model your data. These practices empower you to make the most of Sigma’s capabilities and achieve better performance and flexibility in your data analysis.
If you have any additional questions, please reach out to our team of Sigma experts.
If you realize you forgot to include a “pit stop” table somewhere, you don’t need to rebuild everything. Add a new table element that is sourced from your original or top-level data source (i.e., database table, CSV, Input Table), and then change the source of your modeling step to this new table element.
In general, it’s best to default to modeling outside of a pivot table. A scenario where you’d want to model or calculate something inside a pivot table is when the results are different based on what options are expanded or collapsed. For example, percent of total or percent to target.