In a Postgres DB, given the following simplified dataset of orders
:
ID STORE_ID UPDATED_AT 1 "store-1" 2021-01-01 2 "store-2" 2021-01-02 3 "store-3" 2021-01-03 4 "store-1" 2021-01-04 5 "store-2" 2021-01-05 6 "store-3" 2021-01-06 ... 1000 "store-1" 2021-02-05
How can I write a query to select the the first 5 unique store id’s, and the first 10 orders per store ordered by oldest to newest, resulting in a maximum of 50 rows returned? For example:
ID STORE_ID UPDATED_AT 1 "store-1" 2021-01-01 4 "store-1" 2021-01-04 ... 13 "store-1" 2021-01-12 2 "store-2" 2021-01-02 5 "store-2" 2021-01-05 ... 18 "store-2" 2021-01-08 3 "store-3" 2021-01-03 8 "store-3" 2021-01-12 ... 22 "store-3" 2021-01-22
My goal is to process orders from oldest to newest, but process the oldest 10 orders per store as I can batch them with the Shopify API which would be more efficient.
For example, in my code I will combine it to something like:
{ "store-1": [{ /* order */ }, { /* order */ }, { /* order */ }, ...], // Array of 10 orders "store-2": [{ /* order */ }, { /* order */ }, { /* order */ }, ...], // Array of 10 orders "store-3": [{ /* order */ }, { /* order */ }, { /* order */ }, ...], // Array of 10 orders "store-4": [{ /* order */ }, { /* order */ }, { /* order */ }, ...], // Array of 10 orders "store-5": [{ /* order */ }, { /* order */ }, { /* order */ }, ...], // Array of 10 orders }
So I can run 5 API calls in parallel for each store.
I’ve tried multiple queries, including the following:
SELECT a.store_id FROM orders a INNER JOIN (SELECT store_id FROM orders GROUP BY store_id ORDER BY MIN(updated_at) LIMIT 5) b ON a.store_id = b.store_id ORDER BY a.updated_at ASC;
But I cannot limit the rows to 10 per store ID
Advertisement
Answer
SELECT o.* FROM ( SELECT store_id FROM ( SELECT DISTINCT ON (store_id) -- 1a store_id, updated_at FROM orders ORDER BY store_id, updated_at ) s ORDER BY updated_at -- 1b LIMIT 5 ) s CROSS JOIN LATERAL ( -- 2 SELECT * FROM orders o WHERE o.store_id = s.store_id ORDER BY updated_at LIMIT 10 ) o
1 a) Return only one record per store using DISTINCT ON
b) Identify the top 5 stores ordering by the date.
2. These stores can be used in a lateral join to filter the orders per store, again using the updated_at
order.