Skip to content

Tag: pyspark

Translating pyspark into sql

I’m experiencing an issue with the following function. I’m trying to translate this to a SQL statement so I can have a better idea of exactly what’s happening, so I can more effectively work on my actual issue. I know that this contains a join between valid_data to ri_data, a filter, and a s…

Spark SQL column doesn’t exist

I am using Spark in databricks for this SQL command. In the input_data table, I have a string for the st column. Here I want to do some calculations of the string length. However, after I assign the length_s alias to the first column, I can not call it in the following columns. SQL engine gives out Column &#8…

How can I write an SQL query as a template in PySpark?

I want to write a function that takes a column, a dataframe containing that column and a query template as arguments that outputs the result of the query when run on the column. Something like: func_sql(df_tbl,’age’,’select count(distinct {col}) from df_tbl’) Here, {col} should get rep…