How much data is being stored in my Fabric OneLake (Lakehouse files and tables)

I recently had a question from someone in the fabric community asking how much storage was being used by the Lakehouse and tables.

This got me thinking that currently there is not an easy way to do this.

It is essential to know how much storage is being used, and which files are the largest files or tables.

In this blog post, I am going to show you how I got this working.

There are two parts to this broad post. The first part is where get the data using a notebook and the second part is where I query the SQL endpoint to get data into my Power BI semantic model which are then created report from.

Python Notebook

The first section in the notebook is where I get the size of all the files stored in my lakehouse and store the results into a single table.

I have put comments into the notebook if you want more information.

An overview of the notebook is I first get the Lakehouse tables using the CAT TOOLS – get_lakehouse_tables

Next, I get the Lakehouse files into a data frame.

Then finally I consolidate the 2 data frames and store them in a single lakehouse table called “Onelake_Storage”

You can view the Notebook code in GitHub here: Fabric/Get_Files_Tables_StorageSizes.ipynb at main · GilbertQue/Fabric (github.com)

Power BI Report

As always it is easiest to see how much storage is being used and whereby using Power BI to visualise the data.

To do this I created a new semantic model using the DirectLake feature. I could then create my measures, re-name columns in the Power BI Service. This allowed me to create the Power BI report in the service.

Below is an example of the report I created.

A screenshot of a computer

Description automatically generated

Summary

Thanks for reading I hope that you found this useful where you can now get not only how much storage you are using in your OneLake storage, but you can also see where all the storage is being consumed.

Any comments or suggestions are most welcome.