Copying SQL Server On-Premises data in Microsoft Fabric, which one consumes the least amount of Capacity Units
In this blog post I am going to determine which item workload uses the least amount of Capacity Units when copying the same data from an On-Premises SQL Server.
The item workloads that I can use to copy data are Dataflow Gen1, Dataflow Gen2 and Pipelines.
Setup for Dataflow Gen1
Below is how I configured my Dataflow Gen1 to get the data from my “Fact Sale” table in my On-premises SQL Server.
Setup for Dataflow Gen 2
Below is how I configured my Dataflow Gen2 to get the data from my “Fact Sale” table in my On-premises SQL Server.
Setup for Pipeline
Below is how I configured my Pipeline to get the data from my “Fact Sale” table in my On-premises SQL Server.
Comparing Capacity Units
To get an accurate comparison, I ran the workloads multiple times. I did not run the pipeline again as both instances had the same capacity units.
Dataflow Gen1 took an average of 78 CUs
Dataflow Gen2 took an average of 550 CUs
Pipeline took an average of 360 CUs
Summary
While I did test Dataflow Gen1 I think it is not 100% fair to say that it took considerably less time than the others are that the data from the Dataflow Gen1 still needs to be loaded into a Semantic Model, which would then consume additional Capacity Units.
As shown above, the Pipeline uses the least number of Capacity Units.
Please always make sure that you do your own testing as this can vary on a multitude of factors. This should be a good indication of which workload could be used to save on capacity utilization.
Would it make sense to include “Semantic Model” refresh to compare time & CU?