I would like to start off because I am by no means a Python expert. I have been learning Python for about 2 years 😊

This is the first blog post in a series of blog posts where I dive into how to use Python notebooks instead of Spark notebooks. For example, I will show you how to run a SQL query from a Lakehouse table and get it into a data frame. Read and write to a Lakehouse table and more.

NOTE: This is still in preview, but I personally think that this is worth investing time in learning.

The reason I am using the term Python is because the notebook can ONLY use Python and not any of the other languages available in a Spark

This is what the Python Notebook looks like below.

Below is a link to the Python experience from Microsoft.

Use Python experience on Notebook – Microsoft Fabric | Microsoft Learn

Effective Cost

Because it is running on a Python node, by default it is using 2 vCores and 16 GB of Memory.

What this means is that when running the Python notebook in its default configuration you are only consuming 1 CU (Capacity Unit) for every second it is running.

Using the reference below, each vCore will consume 0.5 CUs.

Because the default is 2 vCores it will consume (2 x 0.5 = 1 CUs).

Reference: Apache Spark compute for Data Engineering and Data Science – Microsoft Fabric | Microsoft Learn

This makes it really cost effective, because what it means is that if your Python notebook ran for 60 seconds it would only consume 60 CUs.

I have started changing some of my customers’ notebooks to use the Python notebook and I have seen the CU consumption drop by 6x or 600%

For example, a notebook that used to consume roughly 320,200 CUs per day, when converting it to a Python notebook it is now consuming roughly 53,189 CUs per day.

Runs faster

I have noticed that the start up time for the Python notebook is significantly quicker!

I am not 100% sure it is because it is only running Python, due to the fact that I had to update the code it is taking less time to complete the same tasks.

And who doesn’t like something that runs faster and cheaper 😊

Python can achieve a lot of things that Spark can

I did not realise that when using a Python notebook, it can achieve a lot of things that can be done with Spark Notebooks.

Sometimes it is the same and other times it requires doing something different to get the same result.

Some examples where I could use the same Python Libraries as listed below.

  • Semantic Link
  • Using the %pip and %conda commands for inline installations, the commands support both public libraries and customized libraries.
  • It is also possible to use the Built-in resources folder for libraries like .whl.jar.dll.py, etc.

An example (which will be a future blog post so stay tuned), is that I can use Duckdb to use SQL syntax, where in Spark I would just a Spark SQL cell.

Session configuration magic command

While I was reading through the documentation I realised that is possible to change the Python notebook configuration.

What it essentially means is that I can have a bigger or smaller Python notebook.

Using the Syntax below at the start of my Python notebook I could change the size of the notebook to use 4 vCores, which means I will also get 32GB of RAM allocated to my notebook.

%%configure -f
{
  "vCores": 4
}

As mentioned above, due to the costing model when I am running this notebook it will be consuming 2 CUs (4vCores X 0.5CUs = 2) for every second that is running.

NOTE: It appears from what I have seen that each vCore will get 8GB of RAM allocated to it.

NOTE II: If you want to increase the size it is has to increase to the power of 2. This means that it goes up in 2,4,8,16 etc.

Browse code snippets

When using notebooks there is a code snippet library that can be used to assist you.

In the example below I am looking for a Python option to write to a Lakehouse delta table.

  1. I clicked on Edit
  2. Then I clicked on “Browse code snippets”
  3. I searched for write
  4. I made sure that I had selected “Python”

I could then see the code snippet options and I could click on the one relevant.

Summary

In this blog post I have shown you some examples of why you should use a Python notebook.

In future blog posts I will go into details using SQL Code, how to read and write to Lakehouse Delta Tables, Loop through a data frame, combine data from a for loop into a single data frame and more…

Thanks for reading, any comments or suggestions please let me know!