Renaming multiple Column Names in a single step using a PySpark Notebook
Following on from my previous blog post this blog post I’m going to demonstrate how to bulk rename column names in a single step instead of having to rename them individually.
The reason this came about is because I had a set of data where the column names had the square brackets which I wanted to remove.
As shown below I have highlighted 2 column names with the square brackets.
I found that it was easier to replace the column names one at a time, instead of trying to do it all at once. This also allowed me to troubleshoot to make sure I was getting the expected results.
Below is the code I used to replace the square backet with a blank value.
# Bulk Rename column Names replacing "]" with "" for name in df.schema.names: df = df.withColumnRenamed(name, name.replace(']', '')) #See Results display(df)
If you want to replace your characters you just replace the ‘]’ with what you are looking to replace.
As shown below you can now see that the right square bracket has been removed.
I then replaced the left square bracket, but I also included the prefix in my data to also be replaced so that the final column name would be more meaningful.
Code for renaming the left square bracket
# Bulk Rename column Names replacing "Table[" with "" for name in df.schema.names: df = df.withColumnRenamed(name, name.replace('Table[', '')) #See Results display(df)
Here is the result with the column renamed where it does not have the left or right square brackets.
Summary
In this blog post, I have shown a quick and effective way to rename multiple columns in my fabric notebook using the PySpark.
I hope you found this useful, it sure did save me a lot of time and effort.
Thanks for reading!