Skip to content
Advertisement

How to format SQL Queries inside PySpark codefile

I would like to format my existing SQL queries inside the PySpark file.

This is how my existing source file looks like:

And this is how I wanted it to look like:

I have already tried using black and other vscode extensions for formatting my code base but no luck since the SQL code is being treated as a python string. Please suggest any workaround

P.S.: I’m having an existing codebase of more than 700+ such files.

Advertisement

Answer

One of the possible options is to use sql-formatter.

Let’s say we have a test.py file:

We can create a script that will read the file as string, find queries by searching for """, extract them, run them through formatter and replace them:

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement