In this blog post I am going to show you an easy way to clean your data (which is often fixing data issues or mis-spelt data) using the new feature Launch Data Wranger using DataFrames

I had previously blogged about using Pandas data frames but this required extra steps and details, if you are interested in that blog post you can find it here: Did you know that there is an easy way to shape your data in Fabric Notebooks using Data Wrangler?

In this blog post I am going to show you how I cleaned up the data in my location column.

Loading the CSV data into a data frame

The first step I did was to load the CSV into a data frame in my PySpark notebook as shown below.

df_staging = spark.read.format(“csv”).option(“header”,“true”).load(“Files/Delay_Data/*.csv”)

# View the df_staging is bringing back data.

df_staging.show(10)

Once I had run this I could then see the output of the data from my CSV files.

Getting my data frame into the Data Wrangler (In my mind I think of it like Power Query)

Next, to transform my data, I need to get my data frame into the Data Wrangler with the steps below.

I clicked on Data.
Then I clicked on “Transform DataFrame in Data Wrangler”
Finally, I select the name of my data frame “df_staging”

Transforming my data using Data Wrangler

In the steps below is where I now transformed my data for the Location column.

I expanded Operations and then selected “Find and replace”

I then completed the following steps below to complete the find and replace.

I clicked on the drop down and selected the column called “Location”
For the Old value, is the value I am searching for to replace.
1. In my example I am looking for “Coast”
I wanted to replace the value with nothing, so I pressed the space bar once.
Once I was happy with the results I clicked on Apply.