I have a query something like this:
SELECT product_id, site, category_id, session_time, sum(cast(coalesce("#clicks", 0) AS bigint)) AS clicks FROM df WHERE site IN ('com', 'co') AND session_time = DATE('2020-02-27') GROUP BY product_id, site, session_time, category_id ORDER BY clicks desc LIMIT 10
But now, I want to see the top 10 product_id for each site and category_id based on the clicks. When I write the LIMIT function, it only shows the top 10 products but it does not group it by category_id and shop_id.
How can I do this?
Advertisement
Answer
Use window functions. You can RANK()
records by descending clicks
within site
/category
partitions in a subquery, and then filter in the outer query:
SELECT * FROM ( SELECT product_id, site, category_id, session_time, SUM("#clicks") clicks, RANK() OVER(PARTITION BY site, category_id ORDER BY sum("#clicks") DESC) rn FROM df WHERE site IN ('com', 'co') AND session_time = DATE('2020-02-27') GROUP BY product_id, site, session_time, category_id ) t WHERE rn <= 10 ORDER BY site, category, clicks desc
I am unclear on why you need the coalesce()
/cast()
logic in the sum()
(just like other aggregate functions, sum()
ignore null
values, and it seems like #clicks
is a number already), so I removed it – you can add it back if you do need it, for some reason that I could not think of.