Spark.sql Filter rows by MAX

Question

Below is part of a source file which you could imagine being much bigger: After the following code: I would like to obtain this result: The aim is to: Select the dates which each cityname has the MAX total (Note, A city can appear twice if they have MAX total for 2 different dates), Sort by total descending, …

Accepted Answer

You can have your result using a SQL window in your request, as follows:SELECT  cityname,   postcode,   date,   totalFROM (SELECT     cityname,     postcode,     date,     total,     MAX(total) OVER (PARTITION BY cityname ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS max_total   FROM tablecases)WHERE max_total = totalORDER BY max_total DESC, date, cityname

Advertisement

Answer