June 13, 2023

Data Modeling Fundamentals in Power BI

By Hayley Hadges

When thinking about Power BI, the platform’s visuals and report side immediately come to mind. While the front-end report visuals are important and the most visible to end users, a lot goes on behind the scenes that contribute heavily to the end product, including data modeling. 

Data modeling in Power BI has a major impact on the performance of reports and should be considered a substantial learning milestone for new Power BI developers.

In this blog, we’ll describe data modeling and its significance in Power BI. We’ll also understand some fundamental concepts such as star schemas, primary and alternative keys, cardinality, cross-filter direction, and active and inactive relationships. 

By the end of this blog, you should be able to look at a fleshed-out data model, such as the example below, and understand why it was built the way it was. 

Throughout this blog, we will return to the example below.

What is Data Modeling?

In essence, data modeling is the process of defining the data structure, properties, and relationships within a data model. A data model in Power BI is a logical representation of how data is structured and related within the tool. It is a collection of tables and relationships between them that are used to create reports and visualizations. 

A data model typically consists of one or more data sources, which can be anything from Excel spreadsheets to cloud-based databases and one or more tables that represent the data in those sources. The relationships that connect these tables are the cornerstone of data modeling and the main topic of this blog.

Why Does Data Modeling Matter in Power BI?

Supports Data Exploration

Data modeling enables users to create hierarchies and drill-down paths, which support efficient data exploration. Users can quickly navigate through data to discover insights and identify trends.

Affects Performance

How a data model is designed directly impacts the speed and efficiency of data retrieval. Poorly designed data models, such as those with complex relationships and data redundancy, can lead to slow query performance, which can cause delays in report rendering.

Promotes More Accurate Reports

When a data model is designed correctly, it can guarantee the accuracy, consistency, and dependability of the data used in reports. This, in turn, can result in more precise insights and improved decision-making.

Easier Future Maintainability

A well-designed data model ensures that reports are scalable, which can reduce the effort required to maintain reports as the business grows and evolves. Creating reusable components, improving documentation, and standardizing data can also ensure that reports are easy to maintain and update over time.

Star Schemas

What is a Star Schema?

The star schema is a well-established data modeling method commonly used in relational data warehouses. This approach requires modelers to categorize their tables as either dimensions or facts. An alternative to this schema is a single flat table, otherwise known as a denormalized table.

This alternative keeps the data in a single large table and is not considered a best practice for Power BI (you may be familiar with this method if you have used Tableau). Keep reading to learn more about star schemas and why they’re a best practice.

Fact Tables

A fact table stores quantitative, numerical data that can be aggregated and analyzed in different ways. It typically contains information about specific events, transactions, or measurements and serves as the centerpiece of a data model. 

In a fact table, each row represents a single transaction or event, and each column represents a measurable attribute or fact about that event. For example, a sales fact table might have columns for the date of the sale, the product sold, the quantity sold, and the revenue generated.

Dimension Tables

A dimension table provides context and additional details to the numerical data stored in a fact table. For example, a product dimension table might contain columns for the product name, product category, product description, and supplier name.

Generally, dimension tables contain a relatively small number of rows. Fact tables, on the other hand, can hold a very large number of rows and continue to grow over time.

In a data model, a dimension table is linked to a fact table through a relationship. By linking the two tables together, Power BI users can analyze the data in the fact table by filtering, grouping, and aggregating the data based on the attributes in the dimension table.

In the data model example, the Orders and Orders Details tables are fact tables that contain numerical data regarding sales transactions, and the Customers, Returns, Products, Salesperson, and Date tables are dimension tables that have additional information about the transactions in the fact tables.

Why are Star Schemas a Data Modeling Best Practice?

Enhanced Performance

A star schema is optimized for analytical querying, which can result in faster query performance when compared to more complex data models.

Flexibility

The star schema is a flexible data model that can be easily expanded and modified as new data sources are added to a Power BI report. It also allows for easy integration with other BI tools and applications.

Easy Maintenance

Because a star schema involves fewer tables and simpler relationships, it is easier to maintain and update over time.

Improved Usability

A star schema can provide a more intuitive and user-friendly experience for end-users of a Power BI report. The data is organized in a logical and easy-to-understand manner, making it easier for end-users to identify trends and patterns in the data.

Relationship Fundamental Concepts

Primary and Alternative Keys

To create a relationship between two tables that need to share a column in common, that’s where primary and alternative keys come in. Primary and alternative keys are columns in a table that create a relationship between two tables.

Primary Key

Also known as a surrogate key, a primary key is a column or group of columns in a table whose values uniquely identify a row in the table. The primary key’s distinct count equals the number of rows in the table.

Alternative Key

Also known as a foreign key, an alternative key is a column in a table whose values correspond to the values of a primary key in another table.

In the data model example, in the Products dimension table, the primary key is the Product ID column, which uniquely identifies a product or row in the Products table. The Products dimension table is connected to the Orders fact table based on Product ID, so the Product ID column within the Orders table would act as an alternative key.

Cardinality

Our fact and dimension tables connect via primary and alternative keys, but we must define their relationship further. Each relationship in a model is defined by a cardinality type. It refers to the number of unique values in one table related to the number of unique values in another.

There are three types of cardinality relationships in Power BI data modeling.

One-to-One (1:1)

This occurs when one record in the first table is related to one and only one record in the second table. This type of relationship is relatively rare in Power BI data modeling.

One-to-Many (1:N)

This occurs when one record in the first table can be related to many records in the second table, and it is the most common type of relationship in Power BI data modeling. This cardinality type is used to link a fact table with one or more dimension tables, where the dimension table is typically on the “one” side of the relationship, and the fact table generally is on the “many” side.

Many-to-Many (N:N)

This occurs when many records in the first table can be related to many records in the second table. This relationship type is not directly supported in Power BI data modeling and is infrequently used.

Understanding the cardinality between tables is essential in Power BI data modeling, as it impacts how data is aggregated, filtered, and visualized in the report. By correctly defining the cardinality in relationships, users can create accurate and meaningful reports that provide insights into their data.

In the data model example, we can see the relationship between the Customers dimension table and the Orders fact table is 1:Many, where one Customer ID value in the Customers table matches up to one or more Customer ID values in the Orders table.

All of the relationships in this model are defined by this 1:Many cardinality except for the relationship between the Orders fact table and the Orders Details fact table, which has a 1:1 cardinality. In this relationship, one Order ID value in the Orders table matches up to one and only one Order ID value in the Orders Details table.

Cross Filter Direction

Each relationship in a model is also defined with a cross-filter direction, which determines the direction(s) that filters will propagate. The cross filter direction can be either single, which we can think of as “single direction,” or both, which we can think of as “both directions” or bi-directional.

Single

When a single-directional cross filter is applied, only one of the tables in the relationship can filter data across to the other table. This means filtering data in one table affects the data shown in the other table, but filtering in the other table does not affect the data shown in the first table.

Bi-Directional

When a bi-directional cross filter is applied, data can be filtered in either direction across the relationship between the tables. This means filtering data in one table affects the data shown in the other table and vice versa. This type is less common and can negatively impact performance and create ambiguous filter paths.

The possible cross-filter options are dependent on the cardinality type. As shown below in the table, relationships with a 1:1 cardinality can only have a bi-directional cross-filter direction, whereas 1:Many and Many:Many relationships can have either single or bi-directional cross-filter directions.

In the data model example, the Products dimension and Orders fact tables have a single direction cross filter from Products to Orders, meaning you can filter the Orders table based on the Products table. Still, you can’t filter the Products table based on the Orders table.

All of the relationships in this data model are defined by the single cross-filter direction except for the relationship between the Orders fact table and the Orders Details fact table, which has a bi-directional cross-filter direction. This bi-directional cross-filter direction means you can filter the Orders table by the Orders Details table, and you can filter the Orders Details table by the Orders table. 

Note that the only 1:1 relationship in this example is also the only relationship with a bi-directional cross-filter direction.

Active and Inactive Relationships

By default, when a relationship is created between two tables, it is considered an active relationship. This means the tables are joined based on that relationship, and the related tables are filtered by the active relationship. However, it is possible to have multiple relationships between two tables, and in some cases, having more than one relationship between tables may be necessary. This is where active and inactive relationships come into play. 

Between two tables, you can only have one active relationship at a time to avoid conflicting or incorrect results. So when multiple relationships exist between two tables, some must be inactive. 

To make use of those inactive relationships, you have a couple of options:

USERELATIONSHIP Measures

You can define a DAX expression that uses the USERELATIONSHIP function. In that DAX expression, you’re essentially telling PBI that you don’t want to use the default active relationship between the two tables but instead switch to using another relationship for the calculation.

Duplicated Dimension Tables

You can create a copy of your dimension table so that each dimension table can connect to the fact table based on those different columns, and each relationship can remain active. With this solution, you don’t have to define multiple measures for the different relationships and can have simultaneous filtering. However, it does increase the model size since the dimension table is duplicated.

In the data model example, we have a Date dimension table connected to the Orders fact table via two relationships–one that connects Date to Order Date and another that connects Date to Ship Date. One of these relationships is active while the other remains inactive, which can be seen by the dotted line of the relationship connection. To make use of the inactive relationship in this scenario, USERELATIONSHIP measures must be created, or a separate Date dimension table must be completed.

Closing

We hope you found this blog helpful in understanding the significance of data modeling and the basic principles crucial to building an effective data model. By understanding these fundamental concepts, such as star schemas, primary and alternative keys, cardinality, cross-filter direction, and active and inactive relationships, you should be well on your way to becoming a more proficient Power BI developer.

Remember to always prioritize data modeling as an essential part of your Power BI development process, and reach out to phData today if you have any questions or need support! Keep exploring and learning; we’ll see you in the next blog!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit