Skip to content
Advertisement

Postgresql – How to identify latest record of a combination of 2 columns (Example inside)

In PostgreSQL, if i have this kind of record :

|------------------|---------------------|------------------|
|  corona_country  |   corona_patient    | corona_suspected |
|------------------|---------------------|------------------|
|     Mexico       |        Teddy        |    2020-03-01    |
|------------------|---------------------|------------------|
|     Indonesia    |        Andy         |    2020-03-01    |
|------------------|---------------------|------------------|
|     Indonesia    |        Andy         |    2020-03-05    |
|------------------|---------------------|------------------|
|     China        |        Sarah        |    2020-03-04    |
|------------------|---------------------|------------------|
|     UK           |        Michael      |    2020-03-05    |
|------------------|---------------------|------------------|
|     Italy        |        Michael      |    2020-03-15    |
|------------------|---------------------|------------------|
|     Italy        |        Michael      |    2020-03-15    |
|------------------|---------------------|------------------|

I want to add extra column (by a Select query) that identifies if a combination of corona_country and corona_patient is having the latest date (of corona_suspected column) among its own records. So it’ll look like this :

|------------------|---------------------|------------------|-----------|
|  corona_country  |   corona_patient    | corona_suspected |  LATEST?  |
|------------------|---------------------|------------------|-----------|
|     Mexico       |        Teddy        |    2020-03-01    |     Y     |
|------------------|---------------------|------------------|-----------|
|     Indonesia    |        Andy         |    2020-03-01    |     N     |
|------------------|---------------------|------------------|-----------|
|     Indonesia    |        Andy         |    2020-03-05    |     Y     |
|------------------|---------------------|------------------|-----------|
|     China        |        Sarah        |    2020-03-04    |     Y     |
|------------------|---------------------|------------------|-----------|
|     UK           |        Michael      |    2020-03-05    |     Y     |
|------------------|---------------------|------------------|-----------|
|     Italy        |        Michael      |    2020-03-13    |     N     |
|------------------|---------------------|------------------|-----------|
|     Italy        |        Michael      |    2020-03-15    |     Y     |
|------------------|---------------------|------------------|-----------|
|     Italy        |        Michael      |    2020-03-15    |     Y     |
|------------------|---------------------|------------------|-----------|

How can i achieve this by sql query? Is there any method that joins another table that records latest dates of each combination of corona_country and corona_patient?

Advertisement

Answer

One option would be using MAX() OVER (PARTITION BY..) analytic function

SELECT *,
       CASE 
       WHEN( MAX( corona_suspected ) OVER 
                ( PARTITION BY corona_country, corona_patient ) = corona_suspected )
       THEN 
         'Y'
       ELSE
         'N'
       END AS latest
  FROM t

OR alternatively

Use DENSE_RANK() analytic function

   CASE 
   WHEN(DENSE_RANK() OVER 
   (PARTITION BY corona_country, corona_patient ORDER BY corona_suspected DESC )=1)
   THEN 
     'Y'
   ELSE
     'N'
   END AS latest

in order to return the result with ties(more than one latest values for corona_suspected columns for each groups)

Demo

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement