I have a dataset that features duplicate values of the the primary variable, something like the following:
x
col1 col2 counts
110 False 1
111 False 2
111 False 1
112 True 3
112 False 2
112 False 1
113 False 1
114 False 1
115 False 2
115 False 1
116 False 1
117 False 1
118 False 4
118 False 3
118 False 2
118 False 1
I have achived this by using the following code
SELECT DISTINCT ctm_nbr
,col1
,col2
,RANK () OVER (PARTITION BY col1 ORDER BY col2) AS counts
FROM my_table
GROUP BY 1,2,3
ORDER BY ctm_nbr, row_numb DESC
However, my desired output needs to be ordered such that counts
is descending yet col1
remains partitioned, so that I can see, for example, which value from col1 has the highest number of counts. Like this…
col1 col2 counts
118 False 4
118 False 3
118 False 2
118 False 1
112 True 3
112 False 2
112 False 1
115 False 2
115 False 1
111 False 2
111 False 1
110 False 1
113 False 1
114 False 1
116 False 1
117 False 1
I have tried various iterations of the final ORDER BY
clause but just can’t quite produce the output I need. Guidance appreciated.
Advertisement
Answer
You can use window functions in the order by
. I think you just want:
ORDER BY COUNT(*) OVER (PARTITION BY ctm_nbr) DESC,
ctm_nbr,
row_numb DESC
This assumes that the count is the maximum value of row_numb()
. So you can also express this as:
ORDER BY MAX(row_numb) OVER (PARTITION BY ctm_nbr) DESC,
ctm_nbr,
row_numb DESC