How to Implement Incremental Refresh in Power BI

Power BI is a user-friendly and powerful data visualization tool that enables individuals to easily create interactive reports and dashboards as well as centralized datasets. With seamless data integration, broad connectivity options, and robust analysis capabilities, it empowers users to uncover insights and make data-driven decisions. 

Its collaboration features, mobile accessibility, and integration with the Microsoft ecosystem further enhance its appeal, making it a popular choice for businesses of all sizes.

Additionally, Power BI’s efficient refreshing capabilities, including incremental refresh and scheduled refresh options, ensure that data stays up to date and reports remain accurate, even with large datasets.

Incremental refresh is a feature in Power BI that allows you to update and refresh data in your reports and dashboards by importing only new or changed data since the last refresh.

Rather than refreshing the entire dataset, incremental refresh focuses on selectively updating the relevant portions of the data, resulting in significant performance improvements and resource efficiency. 

In this blog, we will explain the difference between Full Refresh & Incremental Refresh and how to implement Incremental refresh in Power BI.

Full Refresh vs. Incremental Refresh

When you load data from the source into the destination (Power BI), there are two methods: Full refresh or Incremental Refresh.

Full refresh means fetching the entire dataset each time and wiping out the previous data, whereas Incremental Refresh is the process of loading only part of the data that might change and adding it to the previous dataset, which is not changing anymore. 

It is always a best practice to use a full refresh if your data could change at any point and current values could be subject to change or when the tables in the database are being truncated and loaded with changing data on a regular basis.

You can run a full refresh once a day or week to ensure that any changed data or deleted rows are captured within the refresh. 

However,  It is recommended to use an incremental refresh if your current, existing data is not subject to change and you’re only interested in bringing in new rows of data.

An incremental refresh is also faster, so it could be run once a day to ensure that your daily reports are showing the most up-to-date numbers.

Understanding Incremental Refresh

With incremental refresh, the service dynamically partitions and separates data that needs to be refreshed frequently from data that can be refreshed less frequently. Table data is filtered by using Power Query date/time parameters with the reserved, case-sensitive names RangeStart and RangeEnd.

When initially configuring incremental refresh in Power BI Desktop, the parameters are used to filter only a small period of data to be loaded into the model.

When published to the service, with the first refresh operation, the service creates incremental refresh and historical partitions and optionally a real-time DirectQuery partition based on incremental refresh policy settings and then overrides the parameter values to filter and query data for each partition based on date/time values for each row.

With each subsequent refresh, the query filters return only those rows within the refresh period dynamically defined by the parameters.

Those rows with a date/time within the refresh period are refreshed. Rows with a date/time no longer within the refresh period then become part of the historical period, which is not refreshed.

Query Folding and Incremental Refresh

Query folding in Power BI refers to the process of pushing data transformation steps back to the data source during query execution. It is important in incremental refresh because it minimizes the amount of data transferred and processed, improving performance and reducing resource consumption. 

By folding transformations, Power BI can leverage data source optimizations and push down operations to efficiently refresh smaller portions of the dataset. This enables faster refresh times and efficient utilization of resources during incremental refreshes in Power BI.

Data Sources That Support Query Folding and Incremental Refresh

Relational databases like Microsoft SQL Server, Oracle, MySQL, PostgreSQL, and Snowflake, Analysis Services like Azure Analysis Services, SQL Server Analysis Services, SAP HANA, etc., provide the necessary capabilities for Power BI to fold data transformation steps and perform incremental refresh efficiently.

Setting Up Incremental Refresh

Configuring incremental refresh includes creating RangeStart and RangeEnd parameters, applying filters, and defining an incremental refresh policy. After publishing to the Power BI Service, you perform an initial refresh operation on the dataset. The initial refresh operation and subsequent refresh operations apply the incremental refresh policy you defined.   

     A. Create Parameters

In this task, use Power Query Editor to create RangeStart and RangeEnd parameters with default values. The default values apply only when filtering the data to be loaded into the model in Power BI Desktop. The values you enter should include only a small amount of the most recent data from your data source. When published to the service, these values are overridden by the incremental refresh policy. 

  1. In Power BI Desktop, click Transform Data to open Power Query Editor. 

  1. Click Manage Parameters > New Parameter

In Manage Parameters > Name, type RangeStart (case sensitive), then in Type, select Date/Time, and then in Current Value, enter a start date/time value.

4. Create a second parameter named RangeEnd. In Type, select Date/Time, and then in Current Value, enter an end date/time value.

Now that you have RangeStart and RangeEnd parameters, you then filter the data to be loaded into the model based on those parameters. 

     B. Filter data

With RangeStart and RangeEnd parameters defined, apply a filter based on conditions in the RangeStart and RangeEnd parameters. 

  1. In Power Query Editor, select the date column you want to filter on, and then click the filter icon > Date/Time Filters > Custom Filter

In Filter Rows, to specify the first condition, select ‘is after or is after or equal to,’ then select Parameter, and then select RangeStart.  To specify the second condition, select ‘is before or equal to’, or ‘is before’, then select Parameter, and then select RangeEnd.

Note:

  • If you are using ‘is after or is equal to’ as the first condition and ‘is before or is equal to’ as the second condition, Power BI will load all the records between RangeStart and RangeEnd and also the records, which include RangeStart & RangeEnd dates.

  • If you are using ‘is after’ as the first condition and ‘is before’ as the second condition, Power BI will only load the records between RangeStart and RangeEnd and not the records which include RangeStart & RangeEnd dates.

  • If you are using ‘is after or equal to’ as the first condition and ‘is before’ as the second condition, Power BI will load all the records between RangeStart and RangeEnd and also the records which include RangeStart date, but the records containing RangeEnd date will not be loaded.

Click OK to close.

3. In Power Query Editor, click Close & Apply. Power Query will then load data based on the filters defined in the RangeStart and RangeEnd parameters and any other filters you defined. 

     C. Define Incremental Refresh policy.

After you’ve defined RangeStart and RangeEnd parameters and filtered data based on those parameters, you define an incremental refresh policy. The policy is applied only after the model is published to the service and a manual or scheduled refresh operation is performed. 

1. In Data view > Fields > open the context menu for the table, and then click Incremental refresh.

2. In Incremental refresh and real-time data > Select table, verify, or select the table. By default, the Select table list box defaults to the table you select in the Data view. 

3. Specify required settings: 

In Set import and refresh ranges > Incrementally refresh this table click the slider to On. If the slider is disabled, it means the Power Query expression for the table does not include a filter based on the RangeStart and RangeEnd parameters. 

In the Archive data starting, specify the historical store period you want to include in the dataset. All rows with dates in this period will be loaded into the dataset in the service unless other filters apply. 

In Incrementally refresh data start, specify the refresh period. All rows with dates in this period will be refreshed in the dataset each time a manual or scheduled refresh operation is performed. 

4. Specify optional settings: 

In Choose optional settings, select Get the latest data in real-time with DirectQuery (Premium only) to include the latest data changes that occurred at the data source after the last refresh period. This setting causes the incremental refresh policy to add a DirectQuery partition to the table. 

Select Only refresh complete days to refresh only whole days. If the refresh operation detects a day is not complete, rows for that whole day are not refreshed. This option is automatically enabled if you select Get the latest data in real-time with DirectQuery (Premium only)

Select Detect data changes to specify a date/time column used to identify and refresh only the days where the data has changed. A date/time column must exist, usually for auditing purposes, at the data source. This should not be the same column used to partition the data with the RangeStart and RangeEnd parameters. 

The maximum value of this column is evaluated for each of the periods in the incremental range. If it has not changed since the last refresh, the current period is not refreshed. 

Depending on your settings, your policy should look something like this:

5. Review your settings and then click Apply to complete the refresh policy. Source data is not loaded with this step. 

     D. Save and publish to the service.

When your RangeStart and RangeEnd parameters, filtering, and refresh policy settings are complete, be sure to save your model and then publish it to the service. If your dataset will become large, be sure to enable the Large dataset storage format prior to invoking the first refresh in the service. 

     E. Refresh dataset

In the service, refresh the dataset. The first refresh will load both new and updated data in the refresh period as well as historical data for the entire store period. Depending on the amount of data, this can take quite a long time. Subsequent refreshes, either manual or scheduled, are typically much faster because the incremental refresh policy is applied, and only data for the period specified in the refresh policy setting is refreshed.

Closing

In conclusion, incremental refresh is a valuable feature in Power BI that allows for efficient and optimized refreshing of large datasets. By dividing data into smaller portions based on a partitioning column and leveraging query folding, it minimizes data transfer and processing, leading to improved performance and reduced resource consumption. 

Implementing best practices such as selecting an appropriate partitioning column, optimizing data loading, and establishing a suitable refresh schedule ensures an effective incremental refresh strategy. 

Regular monitoring, testing, and utilization of Power BI Premium features further enhance the benefits of incremental refresh. With incremental refresh, Power BI users can achieve faster refresh times, better resource management, and timely data updates for their reports and visualizations.

No, incremental refresh in Power BI does not automatically delete old data. It is primarily focused on refreshing and updating the dataset with new or modified data based on a defined partitioning column. The purpose of incremental refresh is to efficiently process and load the incremental changes into the dataset while keeping the existing data intact.

Incremental refresh uses two case-sensitive parameters, named RangeStart and RangeEnd. Range Start defines the starting point or the minimum value of the column that indicates the incremental changes. The data after this value will be considered for refresh. Range End defines the ending point or the maximum value of the column that indicates the incremental changes. The data before or equal to this value will be included in the refresh.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit