In PostgreSQL, if i have this kind of record :
|------------------|---------------------|------------------| | corona_country | corona_patient | corona_suspected | |------------------|---------------------|------------------| | Mexico | Teddy | 2020-03-01 | |------------------|---------------------|------------------| | Indonesia | Andy | 2020-03-01 | |------------------|---------------------|------------------| | Indonesia | Andy | 2020-03-05 | |------------------|---------------------|------------------| | China | Sarah | 2020-03-04 | |------------------|---------------------|------------------| | UK | Michael | 2020-03-05 | |------------------|---------------------|------------------| | Italy | Michael | 2020-03-15 | |------------------|---------------------|------------------| | Italy | Michael | 2020-03-15 | |------------------|---------------------|------------------|
I want to add extra column (by a Select query) that identifies if a combination of corona_country and corona_patient is having the latest date (of corona_suspected column) among its own records. So it’ll look like this :
|------------------|---------------------|------------------|-----------| | corona_country | corona_patient | corona_suspected | LATEST? | |------------------|---------------------|------------------|-----------| | Mexico | Teddy | 2020-03-01 | Y | |------------------|---------------------|------------------|-----------| | Indonesia | Andy | 2020-03-01 | N | |------------------|---------------------|------------------|-----------| | Indonesia | Andy | 2020-03-05 | Y | |------------------|---------------------|------------------|-----------| | China | Sarah | 2020-03-04 | Y | |------------------|---------------------|------------------|-----------| | UK | Michael | 2020-03-05 | Y | |------------------|---------------------|------------------|-----------| | Italy | Michael | 2020-03-13 | N | |------------------|---------------------|------------------|-----------| | Italy | Michael | 2020-03-15 | Y | |------------------|---------------------|------------------|-----------| | Italy | Michael | 2020-03-15 | Y | |------------------|---------------------|------------------|-----------|
How can i achieve this by sql query? Is there any method that joins another table that records latest dates of each combination of corona_country and corona_patient?
Advertisement
Answer
One option would be using MAX() OVER (PARTITION BY..)
analytic function
SELECT *, CASE WHEN( MAX( corona_suspected ) OVER ( PARTITION BY corona_country, corona_patient ) = corona_suspected ) THEN 'Y' ELSE 'N' END AS latest FROM t
OR alternatively
Use DENSE_RANK()
analytic function
CASE WHEN(DENSE_RANK() OVER (PARTITION BY corona_country, corona_patient ORDER BY corona_suspected DESC )=1) THEN 'Y' ELSE 'N' END AS latest
in order to return the result with ties(more than one latest values for corona_suspected
columns for each groups)