How to configure Incremental Refreshing in Power BI with DateKey or Date (instead of the default DateTime)
I was helping on the Power BI community forum (where I certainly learn a lot) and I came across this blog post where a Microsoft Employee had suggested that a Power Query function could be used as part of the Incremental refresh.
What I mean by this is when implementing Incremental refreshing in Power BI the default is that you must use a DateTime data type on your column. Whilst this is awesome a lot of Fact tables typically will have a DateKey (in the format of “YYYYmmdd” “20200309”) or Date (in the format of “YYYY-mm-dd” “2020-03-09”)
This got me thinking and I was certain that I could figure out a way to use a variation of the function above so that I could pass through the DateKey to my Fact table. This would mean a massive query performance increase because my SQL Server table is partitioned by DateKey and I have an index on the DateKey.
After some testing and playing around with the function, the underlying view and making sure it works I successfully got it working where I am using Incremental refreshing using my DateKey column.
This got my refresh times even quicker as shown below from an actual implementation
Below are the steps I completed to get this working.
Figuring out how to use the DateKey within Power Query
The first thing I had to figure out was how to use the DateKey instead of the DateTime column (which is a requirement for Incremental Refreshing)
-
The first thing I did was to make sure that I had configured my required parameters are per the requirements to incrementally refresh the data.
- As you can see below, I have my RangeStart and RangeEnd configured as DateTime
-
I then connected to my existing view where I have got my dataset and the column called DateKey
- Now I wanted to make sure that when I used the RangeStart that I could change the output from the DateTime to DateKey in the format of YYYYmmdd
-
I did this by creating a Custom Column and put in the following below.
-
Date.Year(RangeEnd)*10000 + Date.Month(RangeEnd)*100 + Date.Day(RangeEnd)
-
-
I then viewed the result and I could see it was correctly showing the RangeEnd in the format of YYYYmmdd
- I then copied the above code, knowing that it works and will put it in the format I require.
Modifying the Incremental Refresh from using the RangeStart/RangeEnd DateTime to using the DateKey
My next steps were to change the incremental refresh from the defaults when using the RangeStart and RangeEnd to use the DateKey, which I explain below.
- I first created the Incremental refreshing as per the requirement.
-
I went to my Date column which in my example is [Invoice Date Key] and applied the filters as shown below.
- When I looked in the Power Query Editor Step, I could see code created.
- In the steps below is where I figured out how to get it working with the DateKey
-
Next is where I changed it to use the code I had tested in the previous steps. This would then use the [InvoiceDateKey] column and would modify the RangeStart and RangeEnd to be in the format required for the DateKey
- And below is the code
-
Table.SelectRows(#"Added Custom1", each [InvoiceDateKey] >= Date.Year(RangeStart)*10000 + Date.Month(RangeStart)*100 + Date.Day(RangeStart) and [InvoiceDateKey] < Date.Year(RangeEnd)*10000 + Date.Month(RangeEnd)*100 + Date.Day(RangeEnd))
I could then see my table filtered with my parameter values from the RangeStart and RangeEnd
Verifying that my Power Query would still fold back to the SQL Server Source
One very important step was to make sure that my Power Query was still folding back to the SQL Server Source.
-
The first thing that I did to validate this is to right click on the final step in my table and select View Native Query
-
This then showed me that it was indeed folding back
-
- NOTE: There are some instances when a you might not see the “View Native Query” option enabled, and it can still fold back.
-
I wanted to make 100% sure that it did indeed fold back to the SQL Server Source.
- What I did was to change the parameters to be over a longer time range, this would allow me to run a very handy stored proc sp_WhoIsActive which would allow me to see the query running
- As shown below I could then see it was running the Native Query against my SQL Server Source (below is the critical piece of the generated TSQL Code)
-
The final step was to then configure the Incremental Refresh, which I did as shown below.
-
Because this was not the entire dataset, I created a table which just had the one date as shown below.
- This would allow me later to validate that the incremental refresh was indeed working.
- I then saved and uploaded my PBIX to the Power BI Service.
Testing the Incremental refresh
The last step was to test and make sure that the incremental refresh was working in the Power BI Service.
- Once uploaded I configured the dataset to use my On-Premise Data Gateway.
-
I then went to the dataset and clicked on Refresh Now
-
At the same time the refresh was running I then went to my SQL Server, where I could see the query that was generated which proved to me that it was folding the query back to the SQL Server Source (below is the critical piece of the generated TSQL Code)
-
When I looked at the report, I could now see data not only for 1 day but for multiple days
-
To test the Incremental refreshing, I went back to the dataset and clicked on Refresh Now again
-
I could now see the query had changed to only include the data as per my incremental refresh policy (below is the critical piece of the generated TSQL Code)
-
-
I could then see that it completed the refresh in record time (Yes, I know it is marginally quicker but on larger datasets this is a LOT quicker)
Incremental refresh using a Date column
I initially thought that it would be as simple as using a Date column with the data type set to Date and it would work with some code which took the RangeStart/RangeEnd and modified it to be a date.
When testing this out it kept on defaulting back to a DateTime.
My current proposed solution below is to ensure that you can create a DateKey column from your SQL Server Source. Once you have created the DateKey column the above steps will then work.
Summary
In summary I have shown how I changed the incremental refreshing from using the defaults of DateTime to use the DateKey. In doing so I was still able to configure and get the incremental refresh working.
If anyone has got any suggestions on how to get this working, please let me know and I will update my blog and obviously give you the credit!
Thanks for reading and if you got any questions or comments please let me know.
Thanks this looks great. I will try to implement in my scenario. The DateTime requirement is one reason I haven’t used incremental refresh so far (the other reason is not supporting in dataflows).
Awesome, thanks for the comment.
Dataflows does support Incremental refresh but only when running on Power BI Premium as far as I am aware?
Hi Gilbert,
just a question on the last SQL-query (that has the filter on >= 20200307 AND < 20200308.
Because you set the Incremental Refresh to the last 3 days and the query only shows 1 day. Or did you implement any of the other settings like Detect data changes or Only refresh complete days?
Hi Nicky,
The policy will be what is executed when the incremental refresh policy runs. It does not matter what you set the parameters to when uploading the PBIX.
When you check your gateway, does it show the parameters? I have it all set but can’t see the parameters and I have no way to check if it is working or no.
It is not exposed for incremental refresh because it is used on the backend servers?
I was able to transform the Range parameters to the MM/DD/YYYY Format using the following statement. This was very useful for using Incremental Refresh for on the Google Analytics connector that only uses the MM/DD/YYY date format and allowed me to prevent GA from sampling my website data.
= Table.SelectRows(Source, each [Date] >= DateTime.Date(RangeStart) and [Date] <= DateTime.Date(RangeEnd))
Hi Chris
That is awesome thanks for letting me know.
Could I ask how you transformed the Range parameters?
I would love to update my blog post to include this for Google Analytics.
Thanks for this, I was able to replicate this method when publishing a PBIX.
Is it possible to do the same thing when configuring incremental refresh for a Dataflow?
On the Dataset side one sets up the parameters and can convert them to the needed data type when applying the custom filter as you shown here. However on the Dataflow side one does not create the parameters and the UI will only accept a Datetime. How do we get around this?
Thanks,
Paul
Hi there
You should be able to put in the details within your Power Query steps and leave the parameters as they are?
Hi, Gilbert – Enjoyed your talk on Power BI Virtual Conference. I used this blog to set up incremental refresh from AS400 database that does not have a date field. In the data query, I added a column that combined century, year, month, day to create the date column to utilize with the RangeStart and RangeEnd parameters. How can I determine if the query folding is not disabled by this?
And – how can I verify that incremental refresh is actually happening in the service? Just by the refresh times?
Hi Jeff, thanks for watching and I hope you found it useful.
The easiest way would be is to configure it all and then see if the refresh times are a lot quicker in the second refresh.
If not there are certainly other ways to get the data to incrementally refresh. Let me know how it goes.
Hi Gilbert! Thank you for sharing. It’s exactly what I need for a project. I tried to follow your steps but got stuck at the part where you went to your Date column [Invoice Date Key] to apply the RangeStart and RangeEnd to the filter. The parameter filters (RangeStart and RangeEnd) did not show up in the Filter Rows pop up for me. I’m guessing this is because my column is of a numeric datatype (e.g. 20200101) and the RangeStart and RangeEnd are DateTime. Am I missing something? Thank you!
Hi Huong,
You are 100% correct because the column data type is set to numeric it is not giving you the option.
What you can do is to click on the fx button to create a new step in your table.
Then copy the code above, change it from [Invoice Date Key] to your column with the DateKey.
That will then allow it to work.
To transform the Range parameters in the filter query – in the Power Query editor, go to Advanced Editor and you will be able to cast your parameters.
Thanks for the suggestion, that works too!
Gilbert, thanks for your great blog. Do You know if it’s possible to do something similar with dataflows? Configure incremental refresh on dataflows with integer or date only field? Thanks!
Hi Albert
It should be able to be done in the advanced editor of dataflows?
Tried to do it in the advanced editor but for some reason PBI adds the “Date.From” in the filter step for the datekey column:
Table.SelectRows(#”Delete (Columns)”, each DateTime.From([DateKey]) >= RangeStart and DateTime.From(DateKey]) < RangeEnd). Downloading the Dataflow json shows that incremental refresh it`s enabled, and the interface in the service for the incremental refresh config shows the datetime column blank and allows to change the periods. The only problem as far i can tell it`s the "Date.From" that PBI adds automatically after saving the Dataflow. Maybe a new post for your blog with incremental refresh for Dataflows and DateKey? 😉
Is it possible to Incrementally Refresh the table in the dataset which has integer value in the column like 5,6,7. We check for the maximum number in the dataset and only take the data after that maximum number from the data source.
Hi Suryaprakash
As far as I know currently it has to use the DateTime or DateKey to configure the incremental refreshing!
Hi Albert
I am creating a report for sales. My database is SAP HANA and in my dataset I have multiple entries for the same date/time (resulting in duplicate values) and due to it I am not able to use incremental refresh as it gives out error stating unique date/time values are required for incremental refresh.
What do you suggest? Is there any workaround to handle this issue?
Thanks..!!
Hi Mayank,
What you could do is get the data into Power Query, then change the data type to DateTime?
Hello @Gilbert Quevauvilliers
I tried follow the same steps but some how I am not getting parameters option to filter the datekey column in power query.
I have Transaction table with transaction date and post date of type date and I want to set the incremental refresh on it.
I have created two parameters StartRange and EndRange of date/time type.
I have created transactiondatekey column on top of Transaction date using Date.Year(TransactionDate)*10000 + Date.Month(Transactiondate)*100 + Date.Day(Transactiondate) of type whole number.
Now, When I am trying to filter this datekey column in power query I am not getting the option to filter it using parameter and its only showing the values of datekey.
Could you please guide me, how can I achieve this, or if possible can you please share sample .pbix file where you had implemented incremental refresh.
Hi Ajay,
This is correct because you should be using the RangeStart and RangeEnd for the incremental refresh to be used. And because it is now using a different way that default option no longer applies.