Skip to content
Advertisement

Select first 10 rows for every 5 of a unique column

In a Postgres DB, given the following simplified dataset of orders:

ID    STORE_ID    UPDATED_AT
1     "store-1"   2021-01-01
2     "store-2"   2021-01-02
3     "store-3"   2021-01-03
4     "store-1"   2021-01-04
5     "store-2"   2021-01-05
6     "store-3"   2021-01-06
...
1000  "store-1"    2021-02-05

How can I write a query to select the the first 5 unique store id’s, and the first 10 orders per store ordered by oldest to newest, resulting in a maximum of 50 rows returned? For example:

ID    STORE_ID    UPDATED_AT
1     "store-1"   2021-01-01
4     "store-1"   2021-01-04
...
13    "store-1"   2021-01-12

2     "store-2"   2021-01-02
5     "store-2"   2021-01-05
...
18    "store-2"   2021-01-08

3     "store-3"   2021-01-03
8     "store-3"   2021-01-12
...
22    "store-3"   2021-01-22

My goal is to process orders from oldest to newest, but process the oldest 10 orders per store as I can batch them with the Shopify API which would be more efficient.

For example, in my code I will combine it to something like:

{
  "store-1": [{ /* order */ }, { /* order */ }, { /* order */ }, ...], // Array of 10 orders
  "store-2": [{ /* order */ }, { /* order */ }, { /* order */ }, ...], // Array of 10 orders
  "store-3": [{ /* order */ }, { /* order */ }, { /* order */ }, ...], // Array of 10 orders
  "store-4": [{ /* order */ }, { /* order */ }, { /* order */ }, ...], // Array of 10 orders
  "store-5": [{ /* order */ }, { /* order */ }, { /* order */ }, ...], // Array of 10 orders
}

So I can run 5 API calls in parallel for each store.

I’ve tried multiple queries, including the following:

SELECT a.store_id FROM orders a
    INNER JOIN
        (SELECT store_id FROM orders GROUP BY store_id ORDER BY MIN(updated_at) LIMIT 5) b
    ON a.store_id = b.store_id
    ORDER BY a.updated_at ASC;

But I cannot limit the rows to 10 per store ID

Advertisement

Answer

demo:db<>fiddle

SELECT
    o.*
FROM (
    SELECT
        store_id
    FROM (
        SELECT DISTINCT ON (store_id)    -- 1a
            store_id, updated_at
        FROM orders
        ORDER BY store_id, updated_at
    ) s
    ORDER BY updated_at                  -- 1b
    LIMIT 5
) s
CROSS JOIN LATERAL (                     -- 2
    SELECT 
        *
    FROM orders o
    WHERE o.store_id = s.store_id
    ORDER BY updated_at
    LIMIT 10
) o

1 a) Return only one record per store using DISTINCT ON b) Identify the top 5 stores ordering by the date. 2. These stores can be used in a lateral join to filter the orders per store, again using the updated_at order.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement