How you can incrementally refresh any Power BI data source (This example is a CSV File)
How you can incrementally refresh any Power BI data source (This example is a CSV File)
I was reading the awesome blog post by Dustin Ryan (POWER BI INCREMENTAL REFRESH WITH WEB API DATA SOURCE) and what I saw him do gave me an idea, is it possible to configure a CSV file to be incrementally refreshed using a similar pattern.
The pattern that I am talking about is the following below, which will be used as my example below.
-
Connect to a data source which can query fold.
- In my example I have installed and configured an Azure SQL Serverless DB
- In this database I have a date table.
- Configure date table to use Incremental refreshing as per the blog post
-
Create a function which will then use the Date value as part of the parameter
- In my example I have got CSV files which have Exchange rate information from Azure Blob Storage.
- The file names of the CSV files is the date.
- Invoke the Function within the date table to extract the required information.
I know what you might be thinking, that as soon as I add in the column with the function it breaks the Query Folding. That is what I thought too.
The great news is that Incremental refreshing DOES STILL WORK!
Well it still uses the Query Folding and does indeed only refresh the data as defined in the Incremental Policy. If you don’t believe me, read on and I will show you it actually works.
I have tested this with CSV data sources from my local folder and from Azure Blob Storage.
Connecting to the Azure SQL Serverless Database and configuration the Incremental Refreshing
The first step I did was to connect to my Azure SQL Serverless database and configure the incremental refreshing.
I am not going to go into all the details on how I configured the incremental refreshing, the steps below I am going to show how I connected to the Azure SQL Serverless database and made sure that it was still using the Query Folding and the columns I needed.
-
Because my CSV contained the date, I created a view in my SQL Server database, which also contained a column with the date in the format I needed.
- The column I created was called StandardDate
-
I then imported this table, removed all the other rows so I only had the following 2 columns as shown below.
- Did you notice that I purposely kept my StandardDate column as Text?
- This is because I want to use this Text value as part of my CSV File Name
-
The final step was to configure the RangeStart and RangeEnd parameters and put them into the query. Once completed I did verify that Query folding was still working.
Creating my Function to the CSV File
Next, in my pattern is now to create my function.
-
I created a function which is getting data from Azure Blob Storage
- I then created a Parameter called MyDate which contains the date in the format of yyyy-mm-dd (2020-05-26)
-
I then filtered my CSV files with this parameter
-
And then I transformed the data into the format I required.
- I gave my function the name of fx_ER_Blob
Using the Function in my incremental table
This is the step where I now use my function in my incremental table.
-
Now in my data table I clicked on Add Column and then Invoke Custom Function
-
I then selected the StandardDate column as shown below.
-
Next, I expanded the columns to get the CSV data
-
When I had a look to see if query folding was still happening, the “View Native Query” was disabled.
- Even though it is disabled at the end, this does not mean it is not working!
- I then clicked Close and Apply to my data
-
The data was loaded into the data model with the RangeStart and RangeEnd parameters
-
The final step was where I configured the Incremental refresh policy for my Data table.
- I then uploaded my PBIX to the Power BI Service into a Premium App Workspace
-
I configured the required data sources
Testing the Incremental Refreshing
Now this was the big test for me to see if it would actually work.
The reason I published the PBIX to a Premium App Workspace, is because it allowed me to be able to not only see the partitions using the XMLA point, but to also see which partitions would get processed.
-
I then did a Refesh Now on my dataset.
- Once the refreshing had completed, I could then see all my data had updated.
- As shown below I could see that my latest date was 03 Jun 2020
-
The big test had arrivied and it was now time to refresh the dataset again and see if it would ONLY refresh the last 2 days worth of data.
- I clicked on “Refresh Now”, and waited (Rather impatiently) for the dataset refresh to complete.
-
The first place I went to see if the incremental refreshing had worked was the Refresh history, where if it did incrementally refresh it should take substantially quicker to complete.
- As you can see below it was significantly quicker.
- The first refresh took roughly 7 minutes.
- The second refresh took roughly 16 seconds!
-
I sure was happy, but I wanted to make 100% sure.
- I then went into SSMS, went to my table, right clicked on the table and selected Partitions.
- As you can see below ONLY the last 2 partitions were updated.
- The above picture also corresponds to the refreshing times in the Refresh history.
-
Finally, I wanted to also make 100% sure that it was only getting the CSV files it needed from the Azure Blob Storage and NOT querying ALL the CSV files.
- I went into the Azure Portal and had a look at the Transaction in my Storage account.
- As you can see below the first refresh used a lot of transactions.
- Whilst the second refresh used a tiny amount of transactions, proving to me that it only used the CSV files required for the incremental refresh policy!
Summary
In this blog post I have shown that using the pattern of using a Query Folding source (SQL Server) and then continue with other files I am able to still have Incremental refreshing working as I would expect it to.
From my understanding the reason that it still works, is because when the steps are applied in the order shown query folding still happens (which honours incremental refreshing and only queries the required rows) and then continues invoking the function, which only runs on the filtered rows.
By using the pattern, it will now allow me to be able to use incremental refreshing on almost any data source, be it a CSV file, SharePoint List or anything I can imagine.
This will not only save resources where the files are stored, it will also mean quicker refreshes of data. And who doesn’t like things running quicker? ( I know I sure do like things running as fast as possible)
Thanks for reading and if you got any comments or suggestions please let me know.
Great post – I used this technique a few weeks ago but I didn’t have a chance to blog about it. I hope to be able to write a longer description of my discoveries, now I feel less pressure to do that because there is already some coverage of this approach. Thanks!
Hi Marco, thanks for the kind words.
It would be great to see your insights into your discoveries. I hope you don’t mind the pressure!
Looking forward to your blog post.
Nice, GIlbert! finally more people are starting to realize the potential of this! 🙂
You have no clue how excited I am to see more folks blogging about it. I’ve been preaching this for months now.
As long as you know how to take advantage of query folding and lazy evaluation, you’re golden and you can create incredible incremental refresh patterns.
Thanks Miguel for the kind words. And I agree having a good incremental refresh pattern makes things quicker and easier. And who does’t LOVE that!
[…] Gilbert Quevauvilliers wants incrementally to refresh all the sources: […]
It is a major evolution for most people around the world.
Is this tip, is it possible with excel files on my machine, for example?
Hi there
Yes will certainly work, you will connect to the SQL Instance and then use the function to connect to your Excel Files.
[…] on from my successful blog post How you can incrementally refresh any Power BI data source (This example is a CSV File), I found a way where I can just use dates created in Power Query to get data refreshing […]
Hello Gilbert,
thanks to your article, I finally managed to incrementally refresh csv/excel files.
On a separate note, I also successfully tested incremental refresh with a source that does not query fold, e.g. CSV file only containing dates.
Thank you for sharing!
Hi Bostjan
Thanks for the comment and it is great to hear that it is working for you too!
Hello. Would this approach work when used in a dataflow?
Hi Dmitri,
Yes it should work exactly the same because it is applied in Power Query.