How to get the TopN rows using Python in Fabric Notebooks
How to get the TopN rows using Python in Fabric Notebooks When working with data there are sometimes weird and wonderful requirements which must be created in order to get to the desired solution. In today’s blog post I had a situation where I wanted to get a single row with the highest duration. This is how I did it…
Looping through data using PySpark notebook in Fabric
Fabric Notebooks – Looping through data using PySpark Continuing with my existing blog series on what I’m learning with notebooks and PySpark. Today, I’m going to explain to you how I found a way to loop through data in a notebook. In this example, I’m going to show you how I loop through a range of dates, which can then…
Using Sempy to Authenticate to Fabric/Power BI APIs using Service Principal and Azure Key Vault
I have been doing a fair amount of work lately with Fabric Notebooks. I am always conscious to ensure that when I am authenticating using a Service Principal, I can make sure it is as secure as possible. To do this I have found that I can use the Azure Key Vault and Azure identity to successfully authenticate. By using…
How to add current DateTime to existing PySpark data frame in a Fabric Notebook
How to add current DateTime to existing PySpark data frame in a Fabric Notebook In the blog post below, I am going to describe how to add the current Date Time to your existing Spark data frame. This is really useful when I am inserting data into a Fabric Lakehouse table, and I want to know when the data got…
How to get Sempy (Semantic-link) to run when being triggered from a data pipeline which runs a Notebook in Fabric
Below is where I had an error when trying to run a notebook via a data pipeline and it failed. Below are the steps to get this working. This was the error message I got as shown below. Notebook execution failed at Notebook service with http status code – ‘200’, please check the Run logs on Notebook, additional details –…
Renaming multiple Column Names in a single step using a PySpark Notebook
Following on from my previous blog post this blog post I’m going to demonstrate how to bulk rename column names in a single step instead of having to rename them individually. The reason this came about is because I had a set of data where the column names had the square brackets which I wanted to remove. As shown below…
Microsoft Fabric – Comparing Dataflow Gen2 vs Notebook on Costs and usability
In this blog post I am going to compare Dataflow Gen2 vs Notebook in terms of how much it costs for the workload. I will also compare usability as currently the dataflow gen2 has got a lot of built in features which makes it easier to use. The goal of this blog post is to understand which in my opinion…
Microsoft Fabric – Notebook session usage explained (And how to save CU’s or billed time)
I was working on a blog post to determine which consumed fewer Fabric Capacity Units (CU’s), and when I was initially testing this was getting some unexpected results. In a future blog post I will compare a Dataflow Gen2 or Notebook and which one consumes less CU’s In this blog post I’m going to explain the. Lessons are learned when…
An easy way to transform/clean your data using a Notebook in Microsoft Fabric
In this blog post I am going to show you an easy way to clean your data (which is often fixing data issues or mis-spelt data) using the new feature Launch Data Wranger using DataFrames I had previously blogged about using Pandas data frames but this required extra steps and details, if you are interested in that blog post you…