Skip to content
Advertisement

Tag: pyspark

Spark.sql Filter rows by MAX

Below is part of a source file which you could imagine being much bigger: After the following code: I would like to obtain this result: The aim is to: Select the dates which each cityname has the MAX total (Note, A city can appear twice if they have MAX total for 2 different dates), Sort by total descending, then date

How to format SQL Queries inside PySpark codefile

I would like to format my existing SQL queries inside the PySpark file. This is how my existing source file looks like: And this is how I wanted it to look like: I have already tried using black and other vscode extensions for formatting my code base but no luck since the SQL code is being treated as a python

Spark: How to transpose and explode columns with dynamic nested arrays

I applied an algorithm from the question Spark: How to transpose and explode columns with nested arrays to transpose and explode nested spark dataframe with dynamic arrays. I have added to the dataframe “””{“id”:3,”c”:[{“date”:3,”val”:3, “val_dynamic”:3}]}}””” , with new column c, where array has new val_dynamic field which can appear on random basis. I’m looking for required output 2 (Transpose and

is there a method to conect to postgresql (dbeaver ) from pyspark?

hello i installed pyspark now and i have a database postgres in local in DBeaver : how can i connect to postgres from pyspark please i tried this but i have an error Answer You need to add the jars you want to use when creating the sparkSession. See this : https://spark.apache.org/docs/2.4.7/submitting-applications.html#advanced-dependency-management Either when you start pyspark or when you

Advertisement