Skip to content
Advertisement

Correct way to implement nested aggregation in PostgreSQL

I have a table that looks something like this:

| timestamp        | foo_id         | bar_id       |
| ---------------- | -------------- | ------------ |
| 1612344312       | foo_id_1       | bar_id_1     |
| 1612344311       | foo_id_1       | bar_id_1     |
| 1612344310       | foo_id_2       | bar_id_1     |
| 1612344309       | foo_id_2       | bar_id_1     |
| 1612344308       | foo_id_3       | bar_id_2     |
| 1612344307       | foo_id_3       | bar_id_2     |

Multiple foo_id can belong to bar_id, and multiple timestamp can belong to foo_id

What I want to do is to count the number of foo_id and the average length of foo_id (i.e. timestamp difference between the last and the first instance of the foo_id), belonging to the same bar_id.

I came up with this query, but it does not do the job:

SELECT t1.bar_id,
       COUNT(t1.foo_id) AS count,
       AVG((SELECT MAX(t2.timestamp) - MIN(t2.timestamp)
           FROM table t2
           WHERE t1.bar_id = t2.bar_id
           GROUP BY t2.foo_id) AS duration
FROM table t1
GROUP BY  t1.bar_id

I’ll be happy to hear any advise on how to do that correctly – I’m very new to SQL.

Cheers!

Advertisement

Answer

You can use sub-query as follows:

SELECT bar_id, count(distinct foo_id), avg(duration_foo) from
    (SELECT 
        bar_id,
        foo_id,
        MAX(ts) - MIN(ts) as duration_foo
    FROM mytable
    GROUP BY bar_id, foo_id) t
GROUP BY bar_id

Demo db<>fiddle

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement