How to get Sempy (Semantic-link) to run when being triggered from a data pipeline which runs a Notebook in Fabric
Below is where I had an error when trying to run a notebook via a data pipeline and it failed.
Below are the steps to get this working.
This was the error message I got as shown below.
Notebook execution failed at Notebook service with http status code – ‘200’, please check the Run logs on Notebook, additional details – ‘Error name – MagicUsageError, Error value – %pip magic command is disabled.’ :
I first had to make sure that in the Fabric (Power BI Service) the persona was set to Data Engineering.
I then clicked on Environment to create a new environment.
I then gave my new Environment a name and clicked Create
I then needed to add the Semantic-Link (Sempy).
- I first clicked on Public Libraries.
- I then clicked on “Add from PyPI”
- And finally in the library I then typed in “semantic-link”, which then automatically selected the latest version.
I then clicked on Save and Publish.
It first saved and then confirmed the pending changes before publishing, I clicked on Publish all.
I then clicked on Publish in the next screen prompt.
I then clicked on View Progress to view the progress of the publish.
NOTE: This does take some time to complete so please be patient!
Once completed I could see my Environment in my Fabric Workspace
I then went into my Notebook and once it opened I clicked on Environment and changed it to my Environment “FourMoo_Sempy” as shown below.
I then got confirmation of the environment change.
Now in the first part of the code I needed to load the sempy using the code below.
# First need to install Semantic Link %load_ext sempy
In my Notebook I am querying data from a semantic model and outputting it to a table called “Sales_Extract”
# Get the Power BI Workspace and Dataset #Workspace Name ws = "PPU Space Testing" #Dataset Name ds = "WWI Sales - Azure SQL Source - PPU - 4 Years - 2 Days" # Reference: https://learn.microsoft.com/en-us/python/api/semantic-link-sempy/sempy.fabric?view=semantic-link-python#sempy-fabric-evaluate-measure df = ( fabric .evaluate_measure( workspace=ws, dataset=ds, groupby_columns=["'Date'[Yr-Mth]"], measure='Sales' ) ) # Convert to Spark DataFrame sparkDF=spark.createDataFrame(df) sparkDF.show() #Table Name table_name = "Sales_Extract" #Write to Table sparkDF.write.mode("append").format("delta").save("Tables/" + table_name)
Here is the table shown when testing to make sure that the notebook has run successfully.
In my data pipeline I used the Notebook transformation and configured it to use my notebook I created in the previous steps.
I then tested running my data pipeline and it ran successfully as shown below.
I then confirmed this in my Lakehouse table as shown below.
One additional item to show is if I wanted to use this Environment to be the default in your App Workspace, I would it by going into my Workspace settings.
I then did the following to change the default Environment.
- I expanded “Data Engineering/Science”
- I then clicked on “Spark settings”
- Next, I clicked on “Environment”
- The next step was to enable the option to set the default environment.
- Finally, I then selected my Environment as shown below “FourMoo_Sempy”
Summary
In this blog post I have shown how I created the environment to allow me to be able to run the Sempy (Semantic-link) python package when running it from a data pipeline.
I hope you found this useful and any comments or suggestions are most welcome.
[…] Gilbert Quevauvilliers sets up an environment: […]