Skip to content
Advertisement

How to get top N rows with some conditions

I have a query something like this:

SELECT product_id,
         site,
         category_id,
         session_time,
         sum(cast(coalesce("#clicks",
         0) AS bigint)) AS clicks
FROM df
WHERE site IN ('com', 'co')
        AND session_time = DATE('2020-02-27')
GROUP BY  product_id, site, session_time, category_id
ORDER BY clicks desc
LIMIT 10

But now, I want to see the top 10 product_id for each site and category_id based on the clicks. When I write the LIMIT function, it only shows the top 10 products but it does not group it by category_id and shop_id.

How can I do this?

Advertisement

Answer

Use window functions. You can RANK() records by descending clicks within site/category partitions in a subquery, and then filter in the outer query:

SELECT *
FROM (
    SELECT 
        product_id,
        site,
        category_id,
        session_time,
        SUM("#clicks") clicks,
        RANK() OVER(PARTITION BY site, category_id ORDER BY sum("#clicks") DESC) rn
    FROM df
    WHERE 
        site IN ('com', 'co')
        AND session_time = DATE('2020-02-27')
    GROUP BY  product_id, site, session_time, category_id
) t
WHERE rn <= 10
ORDER BY site, category, clicks desc

I am unclear on why you need the coalesce()/cast() logic in the sum() (just like other aggregate functions, sum() ignore null values, and it seems like #clicks is a number already), so I removed it – you can add it back if you do need it, for some reason that I could not think of.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement