A dataset is a reusable data model that empowers users to create reports, dashboards, and analyses without directly accessing the underlying database. Its functionality comprises standing as an intermediary between raw data and visualizations and, thereby, acts as the place to facilitate ease of data exploration and analysis.
It represents a centralized, shared data definition, allowing aggregations and other transformations. Administrators specify datasets, which are then available as sources for visualizations, tables, and pivot tables in workbook analyses. Changes to the dataset are automatically inherited by all workbooks connected to it, ensuring consistency in the metric calculations. Because changes are made to metric logic at the dataset level, all workbooks associated with that dataset are immediately updated to reflect the revised calculations.Â
In this blog, we’ll dive into Datasets specific to Sigma—what they are, how they function, and why they’re essential for building efficient, scalable, and governed analytics workflows. You’ll learn how Datasets empower users to create dynamic reports and dashboards without writing SQL, while also ensuring consistency and reusability across your organization. From step-by-step creation to key features, use cases, and best practices, this blog will equip you with everything you need to harness the full power of Datasets in Sigma.
Key Features of a Dataset in Sigma Analytics
Reusable Data Model – Datasets can be used across multiple workbooks and analyses, thus preventing redundancy.
Live Connection – Sigma has a live connection to cloud data warehouses like Snowflake AI Data Cloud, BigQuery, and Redshift. This means Datasets do not hold data but provide a structured way to query it.
Data Enrichment – Users can add calculations, aggregations, and custom columns in a Dataset.
Joins and Relationships – A Dataset refers to joining data from multiple tables.
Governance & Access Control – Admin can set permissions and access rights for different users.
Performance Optimization – Structured datasets improve efficiency and query performance under the live query model supported by Sigma.
Steps to Create a Dataset in Sigma
Open Sigma and Navigate to Datasets
Log in to Sigma.
Click on the Create button in the top-left corner.
Select Dataset from the dropdown menu.
Select Your Data Source
Choose a data warehouse (e.g., Snowflake, BigQuery, Redshift).
Browse or search for the table(s) you want to use.
Add Tables and Join Data (If Needed)
Select the primary table.
If you need data from more than one table, select the Join button and select another table.
Identify the join keys (columns that link the tables).
Select the join type (Inner, Left, Right, Full Outer) according to your requirements.
Add Calculated Columns (Optional)
Select + > Add Column to create a calculated field.
Use Sigma’s formula editor to define it with
SUM(Revenue)
andCASE WHEN IF
functions.
Apply Filters (Optional)
Add filters to restrict data, such as date ranges, categories, or specific conditions.
Set Aggregations and Formatting
Group data according to the dimensions (Region, Product, Date).
Use aggregation functions like
SUM
,AVG
,COUNT
, etc.Set formats for number and date fields.
Save and Publish the Dataset
Click Save, and the Dataset Name and Description will appear.
Set permissions (who can view/edit the Dataset).
Click on Publish to make it available for reports and dashboards.
Badging Datasets
Badges can be applied to datasets to indicate content Endorsed, Warning, or Deprecated. Notes for badges, while optional, may better furnish the context for all organization members.
Setting Permissions & Sharing Datasets
Click the Permissions tab at the top of the screen.
Click Add Permission.
Search for a member’s email address or a team name.
Select the permission you want to give the selected member or team.
From this point, the dataset may be used in workbooks, charts, and dashboards without direct database querying.
Key Advantages of Datasets in Sigma Analytics
Reusability & Standardization
Datasets are meant to be used across Workbooks, Reports, and Dashboards.
Imposes data consistency through standardization of metrics and calculations.
Live Connection to Data
In contrast to static extracts, Sigma performs live querying of the database, giving real-time updates.
No more manual refreshes/scheduled data extracts.
Performance Optimization
Join predefinitions, filter predefinitions, and calculation predefinitions diminish query complexity further for end users.
Optimized queries promote quick and efficient data retrieval.
No SQL Required for End Users
Business users can explore and analyze data without knowing how to write SQL queries.
Datasets put together a data model that is structured and immediately usable with a drag-and-drop interface.
Data Governance & Access Control
The admin can set permissions for who can view, edit, or use a dataset.
Aids maintaining security and compliance according to organizational policy.
Reduces Redundant Work
Users reference a dataset instead of recreating the same calculations and joining endlessly.
This saves time and increases efficiency across teams.
Easy to Extend & Modify
Adding new columns, calculations, and filters can be done without touching the source tables.
Changes will reflect immediately wherever that Dataset has been used.
Best Practices For Creating Datasets in Sigma
Minimize Joins for Better Performance
Too many joins can slow down query performance. Wherever possible, minimize joins by preparing data beforehand through aggregating or, if possible, flattening at the database level.
Use Filters & Aggregations at the Dataset Level
Installing a dataset filter and aggregation, rather than at the workbook level, would improve performance as it reduces data volume before any queries run, thus enabling better run times.
Follow Clear Naming Conventions
Datasets, fields, and calculations deserve consistent and descriptive names to facilitate easy reading and collaboration. This enables users to gain an understanding of what the dataset is and use it correctly without delay.
Assign Permissions Properly for Security
Ensure that access is restricted based on roles to prevent sensitive data from falling into the wrong hands. Employ permissioning at a dataset level to specify who can view, edit, or share datasets, thereby maintaining the integrity and security of the data.
Following these best practices will help ensure efficient, well-structured, and secured datasets in Sigma!
Common Issues & Troubleshooting in Sigma Datasets
Slow Queries – Optimization Tips
Use aggregations & filters to limit data size.
Shift complex calculations to the database.
Make sure proper indexing, partitioning, and caching are being done.
Join Errors – Fixing Mismatched Keys
Maintain consistency of data types and formats.
Matching the appropriate join type (inner, left, or full).
Many-to-many join verification for duplicate rows.
 Data Refresh Delays – Ensuring Updates
Verify the differences between live refresh and scheduled refresh settings.
Check database connections for stability.
Use incremental refresh to reduce the load.
Proactively addressing these ensures optimized, real-time datasets in Sigma!
Real-World Use Case Examples
Sales Analysis – Combining customer & transaction data.
Finance Reporting – Using Datasets for budget tracking.
Marketing Analytics – Creating a Dataset for campaign performance
Conclusion
Utilizing Datasets in Sigma streamlines live connections, enables reusable data models, and provides role-based access controls, bringing insights closer to business users without SQL skills. Sigma provides a unified platform for creating calculated fields with filters and capabilities to join tables for reporting and decision-making.
If best practices are applied, such as join optimization, proper filters, and governance, organizations can achieve better performance with assured data accuracy and consistency. For finance, sales, marketing, or operational analytics, datasets streamline data workflows and enhance Sigma’s usability as a BI tool.
As Sigma matures, datasets shall unlock the scaling of analytics, strengthen governance, and ease team collaboration within reach. Proper utilization of them can multiply productivity while facilitating quicker and surer decision-making.Â
Need help optimizing your Sigma implementation?
Connect with phData’s experts to maximize performance and drive smarter insights!