Postgresql – How to identify latest record of a combination of 2 columns (Example inside)

In PostgreSQL, if i have this kind of record :

|------------------|---------------------|------------------|
|  corona_country  |   corona_patient    | corona_suspected |
|------------------|---------------------|------------------|
|     Mexico       |        Teddy        |    2020-03-01    |
|------------------|---------------------|------------------|
|     Indonesia    |        Andy         |    2020-03-01    |
|------------------|---------------------|------------------|
|     Indonesia    |        Andy         |    2020-03-05    |
|------------------|---------------------|------------------|
|     China        |        Sarah        |    2020-03-04    |
|------------------|---------------------|------------------|
|     UK           |        Michael      |    2020-03-05    |
|------------------|---------------------|------------------|
|     Italy        |        Michael      |    2020-03-15    |
|------------------|---------------------|------------------|
|     Italy        |        Michael      |    2020-03-15    |
|------------------|---------------------|------------------|

​x
 
|------------------|---------------------|------------------||  corona_country  |   corona_patient    | corona_suspected ||------------------|---------------------|------------------||     Mexico       |        Teddy        |    2020-03-01    ||------------------|---------------------|------------------||     Indonesia    |        Andy         |    2020-03-01    ||------------------|---------------------|------------------||     Indonesia    |        Andy         |    2020-03-05    ||------------------|---------------------|------------------||     China        |        Sarah        |    2020-03-04    ||------------------|---------------------|------------------||     UK           |        Michael      |    2020-03-05    ||------------------|---------------------|------------------||     Italy        |        Michael      |    2020-03-15    ||------------------|---------------------|------------------||     Italy        |        Michael      |    2020-03-15    ||------------------|---------------------|------------------|​

I want to add extra column (by a Select query) that identifies if a combination of corona_country and corona_patient is having the latest date (of corona_suspected column) among its own records. So it’ll look like this :

|------------------|---------------------|------------------|-----------|
|  corona_country  |   corona_patient    | corona_suspected |  LATEST?  |
|------------------|---------------------|------------------|-----------|
|     Mexico       |        Teddy        |    2020-03-01    |     Y     |
|------------------|---------------------|------------------|-----------|
|     Indonesia    |        Andy         |    2020-03-01    |     N     |
|------------------|---------------------|------------------|-----------|
|     Indonesia    |        Andy         |    2020-03-05    |     Y     |
|------------------|---------------------|------------------|-----------|
|     China        |        Sarah        |    2020-03-04    |     Y     |
|------------------|---------------------|------------------|-----------|
|     UK           |        Michael      |    2020-03-05    |     Y     |
|------------------|---------------------|------------------|-----------|
|     Italy        |        Michael      |    2020-03-13    |     N     |
|------------------|---------------------|------------------|-----------|
|     Italy        |        Michael      |    2020-03-15    |     Y     |
|------------------|---------------------|------------------|-----------|
|     Italy        |        Michael      |    2020-03-15    |     Y     |
|------------------|---------------------|------------------|-----------|

 
|------------------|---------------------|------------------|-----------||  corona_country  |   corona_patient    | corona_suspected |  LATEST?  ||------------------|---------------------|------------------|-----------||     Mexico       |        Teddy        |    2020-03-01    |     Y     ||------------------|---------------------|------------------|-----------||     Indonesia    |        Andy         |    2020-03-01    |     N     ||------------------|---------------------|------------------|-----------||     Indonesia    |        Andy         |    2020-03-05    |     Y     ||------------------|---------------------|------------------|-----------||     China        |        Sarah        |    2020-03-04    |     Y     ||------------------|---------------------|------------------|-----------||     UK           |        Michael      |    2020-03-05    |     Y     ||------------------|---------------------|------------------|-----------||     Italy        |        Michael      |    2020-03-13    |     N     ||------------------|---------------------|------------------|-----------||     Italy        |        Michael      |    2020-03-15    |     Y     ||------------------|---------------------|------------------|-----------||     Italy        |        Michael      |    2020-03-15    |     Y     ||------------------|---------------------|------------------|-----------|​

How can i achieve this by sql query? Is there any method that joins another table that records latest dates of each combination of corona_country and corona_patient?

Answer

One option would be using MAX() OVER (PARTITION BY..) analytic function

SELECT *,
       CASE 
       WHEN( MAX( corona_suspected ) OVER 
                ( PARTITION BY corona_country, corona_patient ) = corona_suspected )
       THEN 
         'Y'
       ELSE
         'N'
       END AS latest
  FROM t

 
SELECT *,       CASE        WHEN( MAX( corona_suspected ) OVER                 ( PARTITION BY corona_country, corona_patient ) = corona_suspected )       THEN          'Y'       ELSE         'N'       END AS latest  FROM t​

OR alternatively

Use DENSE_RANK() analytic function

   CASE 
   WHEN(DENSE_RANK() OVER 
   (PARTITION BY corona_country, corona_patient ORDER BY corona_suspected DESC )=1)
   THEN 
     'Y'
   ELSE
     'N'
   END AS latest

 
   CASE    WHEN(DENSE_RANK() OVER    (PARTITION BY corona_country, corona_patient ORDER BY corona_suspected DESC )=1)   THEN      'Y'   ELSE     'N'   END AS latest​

in order to return the result with ties(more than one latest values for corona_suspected columns for each groups)

Demo

Advertisement

Answer