Skip to content
Advertisement

SQL/Presto: rank with multiple conditions (one condition is checking if rows is in the same tag)

I have a data (dt) like the following:

  group_id    customer_id  tag     score   phase
  1           a             l1     0       2020
  1           b             l2     0       2021
  2           a             l4     1       2019
  2           e             l3     1       2019
  2           d             l3     1       2018
  3           q                    1       2020
  3           w                    1       2019
  3           z             l5     1       2019
  3           x             l5     1       2019
  3           c             l6     1       2019

I want to

  1. rank them within a group, first by score (lower score is better)
  2. then concat customer_id if the 2 customers are in the same group with the same score, and same tag (NOT null)
  3. I need to rank them by phase (older phase is preferred) to produce a final list.

So, the desired output is:

 group_id    customer_id   tag     score     phase          rank
  1           a             l1     0         2020           1    
  1           b             l2     0         2021           2
  2           a             l4     1         2019           2
  2           e,d           l3     1        2019, 2018      1
  3           q                    1         2020           2
  3           w                    1         2019           1
  3           z,x           l5     1       2019, 2019       1
  3           c             l6     1       2019             1         
 

I have written the following query but I am not sure how to incorporate the condition of checking if 2 customers are in the tag and proceed further to add phase comparing condition.

  SELECT group_id, customer_id, tag, score, phase, 
  RANK() OVER (PARTITION BY group_id ORDER BY score) AS temp_rank
  FROM dt

Advertisement

Answer

You must group by group_id, tag, score and use array_join(array_agg()) to concatenate customer_ids and phases for the rows with non-NULL tag.
Then use UNION ALL for the rows with NULL tag.
Then use instead of RANK() use DENSE_RANK() window function:

SELECT group_id, customer_id, tag, score, phase,
       DENSE_RANK() OVER (PARTITION BY group_id ORDER BY score, min_phase) temp_rank
FROM (
  SELECT group_id, 
         array_join(array_agg(customer_id), ',') customer_id, 
         tag, 
         score, 
         array_join(array_agg(phase), ',') phase,
         MIN(phase) min_phase
  FROM dt
  WHERE tag IS NOT NULL
  GROUP BY group_id, tag, score
  UNION ALL
  SELECT group_id, customer_id, tag, score, phase, phase
  FROM dt
  WHERE tag IS NULL
) t

See the demo (for Postgresql).
Results:

group_id customer_id tag score phase temp_rank
1 a l1 0 2020 1
1 b l2 0 2021 2
2 e,d l3 1 2019,2018 1
2 a l4 1 2019 2
3 w null 1 2019 1
3 c l6 1 2019 1
3 z,x l5 1 2019,2019 1
3 q null 1 2020 2
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement