I have a dataset that features duplicate values of the the primary variable, something like the following:
col1 col2 counts 110 False 1 111 False 2 111 False 1 112 True 3 112 False 2 112 False 1 113 False 1 114 False 1 115 False 2 115 False 1 116 False 1 117 False 1 118 False 4 118 False 3 118 False 2 118 False 1
I have achived this by using the following code
SELECT DISTINCT ctm_nbr ,col1 ,col2 ,RANK () OVER (PARTITION BY col1 ORDER BY col2) AS counts FROM my_table GROUP BY 1,2,3 ORDER BY ctm_nbr, row_numb DESC
However, my desired output needs to be ordered such that counts
is descending yet col1
remains partitioned, so that I can see, for example, which value from col1 has the highest number of counts. Like this…
col1 col2 counts 118 False 4 118 False 3 118 False 2 118 False 1 112 True 3 112 False 2 112 False 1 115 False 2 115 False 1 111 False 2 111 False 1 110 False 1 113 False 1 114 False 1 116 False 1 117 False 1
I have tried various iterations of the final ORDER BY
clause but just can’t quite produce the output I need. Guidance appreciated.
Advertisement
Answer
You can use window functions in the order by
. I think you just want:
ORDER BY COUNT(*) OVER (PARTITION BY ctm_nbr) DESC, ctm_nbr, row_numb DESC
This assumes that the count is the maximum value of row_numb()
. So you can also express this as:
ORDER BY MAX(row_numb) OVER (PARTITION BY ctm_nbr) DESC, ctm_nbr, row_numb DESC