Power BI Desktop enables the connection and retrieval of data from various sources, followed by data transformation using Power Query. However, a limitation arises when replicating the same transformations across different reports.
Although it is possible to replicate transformations by copying and pasting the M-query in the advanced editor, there are better approaches than this one. This is due to the inconvenience of manually applying any extra transformation actions to all relevant reports.
To address this challenge, Microsoft introduced Dataflows within the Power BI service. Dataflows allow users to establish source connections and retrieve data, and subsequent data transformations can be conducted using the online Power Query Editor. Dataflows are well-suited for tables or queries that can be reused across multiple Power BI files.
In this blog, we will provide insights into the process of creating Dataflows and offer guidance on when to choose them to address real-world use cases effectively. Furthermore, we discuss best practices for implementing and maintaining Dataflows.
What are Dataflows in Power BI?
Dataflows represent a cloud-based technology designed for data preparation and transformation purposes. Dataflows have different connectors to retrieve data, including databases, Excel files, APIs, and other similar sources, along with data manipulations that are performed using Online Power Query Editor.
The primary way to create a Dataflow is within the Power BI service, which can be accomplished within a workspace. It is not feasible to create Dataflows using Power BI Desktop. Dataflows created within the service can be seamlessly utilized in the Power BI desktop application to establish connections and retrieve data.
Power BI Dataflows provide vital functionalities that effectively empower users to cleanse and reshape data from various sources. These Dataflows are crucial in fostering consistency and reducing the duplication of repetitive ETL (Extract, Transform, Load) steps, achieved by reusing transformations. Dataflows include the beneficial capability of a scheduled refresh, ensuring the utilization of the most up-to-date data.
Data flows are collections of tables, but they do not form a relational model without relationships between them. On the other hand, a Dataset is a collection of tables with relationships between them and calculated metrics, all of which are prepared and ready to be used for reporting.
Benefits of Using Dataflows
Dataflows provide several benefits for organizations looking to centralize their data.
Here are some key advantages of using Dataflows
1. Scalability and Performance Improvements
Dataflows allow developers to clean and transform data separately before adding it to Power BI datasets. This separation improves report performance by keeping data preparation away from the report-building process.
In Power BI, Dataflows can use query folding, which pushes the query back to the original data source. This helps to lower the amount of data that needs to be brought into Power BI and improves query performance.
Dataflows are integrated with the wider Microsoft Power Platform, allowing to scale data in other services like Power Apps and Power Automate.
2. Data Preparation and Transformation Efficiencies
Dataflows have the ability to perform multiple transformation steps simultaneously, which is particularly useful when working with large datasets.
Developing reusable data transformations, such as custom functions, helps consistency and reduces the need to recreate complicated changes.
3. Centralized Data Management and Reuse
Users use the same data source repeatedly for various parts of the business. This can be expensive, especially in large setups where organizations have extensive data systems. To bypass this database overloading concerns, we can create a centralized Dataflow that can be re-used across multiple reports.
4. Enhanced Data Refresh and Update Options
Dataflow provides two options for updating data. Those are schedule refresh and incremental refresh. The schedule refresh option helps ensure that data is up to date, and the incremental refresh option allows refresh for only the newest data rather than all the historical data. This speeds up the refresh process and lets you quickly see the latest data in the visuals.
Creating Dataflows in Power BI Service
To create a Dataflow, Open the Power BI service using a web browser. Next, choose the workspace where you intend to create a Dataflow and click on the New option located in the top-left corner, as shown in the image below.
Note: Dataflows are not present within the My Workspace section of the Power BI service.
Upon selecting the Dataflow option, you will encounter four distinct methods for creating a Dataflow. Each option is customized for specific solutions and purposes, as shown and explained below.
1. Define New Tables
Defining new tables in a Power BI Dataflow involves selecting a data source and then structuring the data to create new tables by defining the necessary transformations and calculations that will be used for further analysis in your reports and dashboards.
The Define new tables option enables you to create a new table and establish a connection to a data source. As shown in the image below.
Upon selecting a data source, you need to enter connection settings, specifying the server details for connecting to the data source. This is illustrated in the image below.
After providing the necessary credentials, you will need to choose the specific tables you want to work with and apply transformations to them, as shown below.
2. Link Tables From Other Dataflows
When you link tables from other Dataflows, you are creating a reference or pointer to the original table instead of duplicating the data. This means you are not creating multiple copies of the same data, which reduces storage costs.
Linking tables promotes data consistency across your organization. Any changes made to the linked table in its source Dataflow are automatically reflected wherever it is linked. This eliminates the need to manually update or synchronize multiple copies of the same data, reducing the risk of data inconsistencies.
The Link tables from other Dataflows option allows you to reuse an existing data flow, as illustrated in the image below.
3. Import Model
This feature comes in handy when you want to keep a copy of a Dataflow for offline use or transfer it between different workspaces. With the import option, users can create a new version of the Dataflow while the original Dataflow remains unchanged.
4. Attach a Common Data Model Folder (preview)
When you create a Dataflow from a CDM folder, you can establish a connection to a table authored in the Common Data Model (CDM) format by another application. As part of this process, you will be asked to specify the full file path to the CDM format file stored within Azure Data Lake Storage Gen 2 (ADLS Gen 2), as shown in the image below.
This path is essential for accessing and manipulating the CDM data within your Dataflow.
Utilizing Dataflows in Specific Scenarios
Here are a few scenarios where Dataflows can be effectively utilized
1. Data Integration
Using Dataflows, you can gather information from various sources such as Databases, Files, Online services, Power Platform, and more.
Then, in the Dataflow, you can transform and organize this data to make one complete set. This is handy when your data is from many different sources, and you want to assemble it to analyze.
2. Data Transformation
Data transformation means preparing raw data for analysis by making it cleaner and more organized. Dataflows help with this by giving you a simple way to do things like removing data duplicates, dealing with merging queries, pivoting, and unpivoting the data. This is important so the data is correct and makes sense when we use it later.
3. Dataflow for Data Enrichment
Data enrichment means making the data better by adding extra information to it. In Dataflows, we can create new columns using power query options like custom and conditional columns based on the data we already have.
We can also make calculations that add up numbers, find averages, and so on. These new things give more details and help us understand the data better.
4. Dataflow for Data Reuse
When you have made a Dataflow with clean and improved data, you can use it again in different reports and workspaces. This ensures that everyone understands the data the same way and avoids doing the same work more than once. By keeping the data cleaned and changes in one Dataflow, you do not need to do the same things again for different reports.
Best Practices for Dataflow Implementation
Here are some best practices for Dataflows implementation:
If you experience delays when adding new steps to your query within the Power Query Editor, think about initially implementing a “Keep First Rows” operation to reduce the number of rows you are processing. Later, after you have included all the necessary steps, you can remove the “Keep First Rows” operation.
We suggest you maintain proper documentation for your queries by either renaming or providing descriptions for your steps, queries, or groups as needed. While Power Query generates a default step name for you in the applied steps pane, you have the flexibility to rename your steps or include descriptions for any of them.
Using query groups within the queries pane is an excellent approach to ensure your work stays organized. Query groups are specifically designed to help you keep your work structured by serving as folders for your queries.
Applying common transformations to a source table can enhance overall performance.
We recommend that you split your data processes into two parts: one for getting data from the source and another for making transformations to it. This helps create a structured approach and simplifies the development of these processes. Dividing the process into two parts, with the first part involving tasks such as data retrieval and basic transformation and the second part focusing on more intricate transformations, helps simplify the overall complexity of your Dataflow.
Utilizing a custom function offers the advantage of having just one source code version, eliminating the need for duplicating code. This simplifies the maintenance of the Power Query transformation logic and the entire Dataflow process.
We suggest establishing distinct Dataflows for various source types like on-premises, cloud, SQL Server, and Databricks. Organizing Dataflows based on source type simplifies problem-solving and prevents potential constraints during Dataflow refreshes.
When you establish a distinct refresh schedule for a linked Dataflow, it can result in unnecessary refreshes and prevent you from editing the Dataflow. To resolve this issue, consider the following two recommendations:
Avoid setting a refresh schedule for a linked Dataflow within the same workspace as the source Dataflow.
If you wish to configure a separate refresh schedule and prevent the locking behavior, consider moving the Dataflow to a separate workspace.
We suggest prioritizing efficiency in your model designs by ensuring query folding whenever it is feasible.
We recommend using data profiling options within Power Query to assess the quality of columns, examining their validity and errors.
Dataflows vs. Power Query in Power BI Desktop
Dataflows and Power Query in Power BI Desktop are being used to cleanse and transform data, each with its own purposes and differences, as listed below.
Dataflows are structured for centralized data preparation and transformation, enabling the creation of reusable data transformations that can be used across various reports.
Dataflows web-based Power Query has slower performance when compared to Power Query within the Power BI Desktop application.
Dataflow’s web-based Power Query has a distinct M-query syntax compared to Power Query in Power BI Desktop. For instance, to extract the year from an order date, use the syntax:
In the web-based Power Query, it automatically converts all errors to null values when you perform certain operations, such as changing the data type of a column. For example, if you change the data type of a column, any associated errors will be automatically transformed into null values.
Power Query in Power BI Desktop
Power Query is an essential component of Power BI Desktop and is primarily used for data transformation within a specific report or dataset.
In terms of speed, Power Query within Power BI Desktop excels over Dataflow’s web-based Power Query.
Power Query in Power BI Desktop uses a different M-query syntax compared to Dataflows Power Query. For example, to derive the year from an order date, use the syntax:
When you conduct specific operations, like altering the data type of a column, any corresponding errors will be displayed and indicated.
Real-World Use Cases
Here are some real-world use cases where Dataflow implementation plays a vital role.
1. Simplifying Data Preparation for Multiple Reports
Suppose a company aims to efficiently organize monthly billing data for individual clients, focusing on metrics like gross and net amounts, profits, and so on, and in this scenario, opting for a Dataflow solution that seamlessly integrates with Excel spreadsheets and provides transformations like transpose and unpivoting.
This Dataflow will serve as a valuable resource for multiple teams, including HR, Finance, and Project Management, facilitating the visualization of key insights within their reports.
2. Leveraging Dataflows and Machine Learning for Predictive Analysis
To enhance its sales forecasting capabilities, a company can leverage Dataflows and machine learning (ML) features. This approach empowers the organization to conduct predictive analysis, gaining insights into expected sales performance in the upcoming days or periods.
The process begins with data preparation within Power BI Dataflows, where historical sales data and relevant factors are curated and cleaned. With the historical data as input, we can create a machine learning model within the Dataflow environment by utilizing the Apply ML Model option in the action section.
These predictive sales insights are being visualized within Power BI reports. This empowers stakeholders, including sales teams and executives, to make decisions, allocate resources effectively, and optimize marketing strategies, improving overall business performance.
3. Centralized Data Management
Suppose a company has a list of parts that are being used in its manufacturing process. If the employees explore to see the components that are being used in the device. They look at Excel files that have obsolete parts, test parts, and others.
To make it better, clean up the Excel list using Dataflow and store it as centralized data. This way, everyone in the company can easily access and use the active parts list in their reports.
Data flows are incredibly useful in situations where your data source is slow and multiple datasets need the same data or a significant portion of it. In these cases, a data flow acts as a centralized storage location for the data.
Power BI Desktop can access it with significantly faster retrieval times. We apply this method, particularly to our on-premises data sources, where using a gateway to access data can take time and effort.
In conclusion, Dataflows allow you to perform reusable data transformations and custom functions to promote consistency and reduce the need to recreate complex changes. Dataflows provide centralized data and reduce the load on data sources.
Furthermore, Dataflows harness predictive analysis through the utilization of machine learning models.
If you need any kind of assistance to explore more effective ways of leveraging Power BI Dataflows, do not hesitate to reach out to our team of experts for support, guidance, and best-practice recommendations. Contact us today!
Dataflows simplify the transformation of data, making it accessible to multiple reports. They also support predictive analysis by integrating machine learning for insights into forecasting. Additionally, Dataflows centralize data management and support the performance of datasources.
Dataflows mainly handle data transformation and make it reusable across multiple reports, while datasets are customized to meet specific reporting needs.