How I saved 40%+ on my Power BI Dataset size

by Gilbert Quevauvilliers November 27, 2019

Uncategorized 9

I have been working with a customer who has a fairly large dataset. Part of my role was to see if there was any way to see how I could reduce the amount of memory being consumed, whilst still allowing for the same metrics to be represented.

Challenge accepted!

I had already applied all the best practices in terms of reducing the cardinality, removing unwanted columns and making sure that only the data required is being brought into the dataset. Even at this point the dataset size was consuming 90GB of memory in Azure Analysis Services. With the steps below I got my dataset size down to a whopping 37GB of memory!

I used the awesome tools from SQLBI.COM and DAX Studio to see which columns were consuming the most space, and because my dataset had currency converted values, this meant that the cardinality was very high. (The reason that I decided to store the currency conversion values, is when trying to do it on the fly in a large dataset it is very slow)

Original dataset size

As you can see below the original size of the PBIX that I had was 5.01MB

When this dataset was up and running it was consuming about 20MB of memory (As previously blogged about on How much memory a PBIX will consume)

Reducing the dataset size by using Fixed Decimal

When I was reading through the documentation on the data types it appeared to me that if I could change the data type from Decimal Number to Fixed Decimal Number I could save on the dataset size.

I had a look and for the data type to become Fixed Decimal it can only have 4 decimal values.

I made sure that if I did change the values in my column to only 4 decimal places that I the totals would remain the same.

As you can see below the SalesAmount_EUR has got the data type set to Decimal Number which is my starting point.

Next I changed my source query to only return the values with 4 Decimal Numbers as shown below. (You could also do this in Power Query if need be)

I did double check the values and they still matched.

Now when I saved my PBIX it went from 5.01MB to 4.18MB

That is a saving of roughly 16%, which I initially thought was pretty good.

I still felt that there was something more that I could do to get the dataset smaller and with some creative thinking there is, so read on.

Making the dataset really small and efficient by using Whole Number

I had an idea, which I got after remember a conversation I had with Phil Seamark, where he mentioned that under the hoods of SQL Server Analysis Services (SSAS) tabular when you have a decimal point it converts it to a whole number and stores where the decimal point was.

This got me thinking, what if I could now split my SalesAmount_EUR into separate columns?

That would then mean I could two columns of whole numbers. Not only that but the cardinality for the decimal values would be significantly less.

I immediately got cracking and by using TSQL (You could once again use Power Query) I split the column into 2 columns. My one challenge was to ensure that when I had a negative value that my cents column also was negative!

Once I imported the data, I made sure that the data type for my columns was set to Whole Number

At this point I was really hoping that this would allow for a significant saving on the dataset & memory size.

When I saved the file, it had dropped down to a tiny 1.75MB

Of course, I just HAD to create a Power BI visual to show how much I had saved in this particular exercise.

As you can see above, I got a 65% decrease in the size of the dataset.

The memory size also decreased from 19MB of memory down to 2.1MB of memory.

Now I know you might be thinking that I now need to add in some additional logic into my DAX measure to bring these two columns together to represent the same values?

I did test this and due to the dataset size being a lot smaller the effect was then a smaller dataset size, smaller dictionary size and smaller columns hierarchy size which means that the super-fast SSAS engine can process the data faster.

For reference this is how I put the values back together in one measure

Sales Amount EUR = 
SUM ( Data[SalesAmount_EUR] )
            + ( DIVIDE ( SUM ( 'Data'[SalesAmount_EUR_Cents] ), 10000 ) )

Conclusion

As I have shown there are at times very creative ways to reduce the size of your datasets.

I have read a fair amount of blog posts in my time and as I have previously read, I would suggest testing this with your own data. The mileage in terms of savings will vary and very often it will be worth the effort to get the savings. Not only that but you learn a lot on the way too.

Writing this blog post has been exciting for me and I hope that you enjoyed it too.

As always thanks for reading and if you got any comments or suggestions please leave them in the section below.

391 / 567

BI-NSIGHT – Power BI (Service Update, External Sharing, Admin Portal, Web & Video Widget, Zooming) – SQL Server 2016 (Mobile Publisher Preview 2)

BI-NSIGHT, Power BI, SQL Server 2016, Uncategorized 0

SSIS – Creating a process which will check a condition, if it is not ready wait and try again.

SQL Server Integration Services, Uncategorized 0

How you can incrementally refresh any Power BI data source (This example is a CSV File)

Power BI, Uncategorized 12

9 Comments

Marco Russo November 28, 2019 at 5:52 pm

You should get the same size by storing the decimal part in a fixed decimal number so you avoid the divide at runtime and you only have a sum. Did you try that?

Reply
1. Gilbert Quevauvilliers December 2, 2019 at 9:49 pm
  
  I did try that, the problem is because the exchange rates when converted are all almost unique I do not get the same compression because the values are almost all unique. There are very few rows which are repeatable in my current dataset.
  
  As I learnt from you, I would always suggest testing it out as if I could keep it all as a decimal number it would be a lot easier to manage and have the DAX measure being easier too.
  
  Reply
Shrinking Your Power BI Dataset Sizes – Curated SQL December 3, 2019 at 11:05 pm

[…] Gilbert Quevauvilliers wanted to reduce Power BI memory usage: […]

Reply
How to reduce your Power BI model size by 80%! – Data Mozart July 26, 2020 at 9:28 am

[…] A similar technique can bring significant savings when you have decimal values in the column. You can just simply split values before and after the decimal like explained in this article. […]

Reply
Power BI’s Data Compression: Large Data Imports in Power BI - IntelliTect March 4, 2021 at 6:50 am

[…] https://www.fourmoo.com/2019/11/27/how-i-saved-40-on-my-power-bi-dataset-size/ […]

Reply
C# Baby! - IntelliTect July 7, 2021 at 3:58 am

[…] https://www.fourmoo.com/2019/11/27/how-i-saved-40-on-my-power-bi-dataset-size/ […]

Reply
Comparar datos con granularidad diferente en Power BI: Exactitud del prorrateo - dataXbi March 31, 2022 at 5:03 am

[…] Este método se describe en esté artículo: https://www.fourmoo.com/2019/11/27/how-i-saved-40-on-my-power-bi-dataset-size/ […]

Reply
kaviyarasan August 25, 2022 at 10:57 pm

How can i split with negative symbol to decimal values
Ex: Column Split 1 Split 2
-345.25 -345 -25

Reply
1. Gilbert Quevauvilliers September 1, 2022 at 2:25 pm
  
  Hi Kaviyarasan
  
  What you would need to do is to find out if the entire number value is negative.
  
  If it is negative then once you have split the column then multiply it by -1
  
  Reply

How I saved 40%+ on my Power BI Dataset size

Challenge accepted!

Original dataset size

Reducing the dataset size by using Fixed Decimal

Making the dataset really small and efficient by using Whole Number

Conclusion

Related

9 Comments

Leave a Reply Cancel reply

How I saved 40%+ on my Power BI Dataset size

Challenge accepted!

Original dataset size

Reducing the dataset size by using Fixed Decimal

Making the dataset really small and efficient by using Whole Number

Conclusion

Share this:

Related

Related Posts

BI-NSIGHT – Power BI (Service Update, External Sharing, Admin Portal, Web & Video Widget, Zooming) – SQL Server 2016 (Mobile Publisher Preview 2)

SSIS – Creating a process which will check a condition, if it is not ready wait and try again.

How you can incrementally refresh any Power BI data source (This example is a CSV File)

9 Comments

Leave a Reply Cancel reply