Getting the lineage from a Power BI dataset to table to data source to original query
As often happens blog posts originate from a customer requesting something I have not figured out before.
In this example the requirement was to be able to determine which data sources were being used by which tables, which were then associated to one or many Power BI datsets.
While I was working through this I figured out I could take it one step further and also if required have the actual Power Query as part of the report.
Getting the source data
What I did to get the source data was to leverage the data that is captured from the Power BI Scanner data.
The good news is that my good friends at PowerBI.Tips have got a great blog post which I used to capture the Power BI Scanner data and store it in SharePoint.
I would highly suggest you complete these steps first with the link here: Using the Power BI Scanner API to Manage Tenant’s Entire Metadata –
I created a new PBIX file, went into Power Query
What I then did was to then connect to the data stored in SharePoint and selected the latest MetaData%.json file.
Then I did a whole lot of steps in which I first expanded the workspace list.
After that I then expanded all the records for the datasets, which then showed me all the details for each of the datasets.
Now one additional step that I added was to be able to know what type of data source was being used. To do this I got the details from the Power Query documentation which has got a list of all the current Power BI data sources.
https://learn.microsoft.com/en-us/powerquery-m/accessing-data-functions
I then used Excel to be able to quick create a conditional column to get the valid data source, below is a small excerpt from the code.
This allowed me to then create a column with the Data Source Names.
The last step I did was to create a URL which would allow me to go directly to the dataset in the Power BI Service. I though that this would be a great addition as very often there are a lot of app workspaces with a lot of datasets. And having the ability to go directly to the dataset is very convenient for the end user. This is what the column looked like.
To also get the data source descriptions I then also connected to the Power Query data sources and got the descriptions from the web page, here is the output in Power Query.
Now that I had both all the details for my datasets, tables, queries and data sources I then loaded the data to my model.
Creating the Power BI Report
The final step was to now use this data so that I could create the Power BI report where it would be easy to view the lineage from the dataset to the table, to the data source and if needed the Power Query query.
I created a relationship between Dataset with Tables and Data Source Type Descriptions.
I then created the Power BI Report as shown below, where I had put in the link all the details. I had also created a tooltip so when the user hovers over a row they can see the data source description.
I then uploaded this to the Power BI Service, configured the refresh and shared it with the relevant users.
I also created a drill through page where I can see the Power Query query for a particular table as shown below.
Summary
In this blog post I have demonstrated how to use the Power BI Scanner to show the data lineage from dataset, to table, to data source and even to the Power Query details.
I hope that you found this useful, and any comments or suggestions are most welcome.
Here is a link to the PBIT if you want to connect it to your SharePoint file: Blog – Dataset with Table Names and Data Source Types.pbit
[…] Gilbert Quevauvilliers reads the chain of custody documents: […]
Hi, Thank you for sharing this useful blog… really love it…
Nice blog article, thanks for sharing
Thanks & Regards
V.saibabu
Thanks