In data analysis, a union is an operation that combines the results of two or more queries or datasets into a single dataset. It is used to merge datasets with the same structure (i.e., the same columns) but different data.
The importance of unions can be summarized as follows:
Data Integration: Unions allow data integration from different tables, enabling a more comprehensive analysis.
Flexibility: They provide flexibility in data manipulation, allowing users to create custom datasets that fit their specific analytical needs.
Efficiency: Unions can reduce the time and effort required to collate data from different sources manually.
In this blog, we will provide step-by-step guidance on creating unions in Sigma Computing, enabling users to integrate and analyze data from multiple tables effectively.
What Are Unions in Sigma Computing?
Unions in Sigma Computing are powerful tools for data analysis. They allow you to combine table data with at least one common attribute, enabling a more comprehensive and integrated analysis, particularly when dealing with large datasets distributed across multiple tables.Â
By using unions, you can create a unified view of your data, making it easier to extract insights and make data-driven decisions. Therefore, the importance of unions lies in their ability to enhance data manipulation and analysis capabilities.
How to Create Unions In Sigma Computing
Step 1
Open Sigma in your browser and go to the menu pane. Click on Create New and select Workbook from the available options. This action will open a new workbook where you can begin adding data elements.
To create a new union, we need to add a Table element and select the dataset you want to union. In our case, we will create three separate datasets and then combine them again.
Step 2
To create three separate datasets, we are using the D_MLB_TEAMS
sample Sigma dataset (30 rows). We’re going to split up the MLB DIVISION
column as Central, East, and West using a filter. Lastly, we’ll recombine them using a union. Each dataset has 10 records.
Step 3
Now, with our three datasets, we can add a table element and select the union option. In the screenshot below, we will add:
D_MLB_TEAMS_EAST
D_MLB_TEAMS_CENTRAL
D_MLB_TEAMS_WEST
After selecting the union option, you must select any datasets you want to union. In our case, it is D_MLB_TEAMS_EAST
. After choosing, click Select.
After selecting, we must select the + icon at the top of the section to add the other two remaining datasets to create a union.
D_MLB_TEAMS_WEST
and D_MLB_TEAMS_CENTRAL
are the remaining datasets that must be added to the sources in the Create Union window.
The fields from the three datasets are automatically aligned to the closest possible match, but we can change their position by using the dropdown arrow on each field. Remember that the fields don’t need the same name in all sources but must be of the same data type.
We can see the output preview by clicking to check for data inconsistencies.
After selecting Done, the final table will be labeled “Union of # Sources” and can be utilized throughout your workbook or data model. To make changes to the union, highlight the union table, then navigate to the bottom left corner of the Elements Properties tab. You will find a drop-down menu next to the union table name where you select Edit Union to make any further changes.
Best Practices For Creating Unions in Sigma Computing
Here are some key best practices to consider:
Ensure Consistent Field Types: Fields to be unioned must have the same data type (e.g., strings to strings, dates to dates) to avoid errors and maintain data integrity.
Standardize Field Names: Although not mandatory, consistent field names across datasets can simplify the union process and reduce confusion.
Clean and Preprocess Data: Before performing a union, ensure the data is clean and preprocessed. Remove duplicates, handle missing values, and standardize formats.
Closing
Creating unions in Sigma Computing is straightforward, significantly enhancing data analysis capabilities. By following the steps outlined in this blog, you can easily combine datasets to gain deeper insights and make more informed decisions.
If you have any questions or need further assistance, please contact phData for expert support.
FAQs
What is the difference between join and union in Sigma?
Joins combine data horizontally based on a common key, while unions stack data vertically from multiple datasets with the same structure. Check out Will Strouse’s blog on the difference between Joins, Lookups, and Unions.
What will happen to the unmatched column while doing union in Sigma?
Unmatched columns will be included in the final union table unless you manually deselect them in the Union configuration settings.