Skip to content
Advertisement

Get the row where the sum of a value matches a condition

I have a table with the columns:

  • date (timestamp)
  • num (integer)

Looks like this in CSV:

"date","num"
"2018-02-07 00:00:00","1"
"2018-02-16 00:00:00","1"
"2018-03-02 00:00:00","4"
"2018-04-04 00:00:00","6"
"2018-06-07 00:00:00","1"

I want different queries to figure out the following:

  1. A: The earliest date that the sum of num is >= 1
  2. B: The earliest date that the sum of num is >= 2

In the sample data the output would be A: 2018-02-07 and B: 2018-02-16.

Note that if the first date in the data had a num higher than 1 then A and B would both equal the same date.

Grouping and using MIN(date) would be good enough to satisfy A but I can’t figure out how to get B to work if there are two days with num = 1 right after another. Any ideas are appreciated.

Advertisement

Answer

Use a cumulative sum. For a single number:

select t.*
from (select t.*, sum(num) over (order by date) as running
      from t
     ) t
where running >= 1 and running - num < 1
order by date 
limit 1;

If you wanted multiple thresholds at the same time:

select min(date) filter (where running >= 1) as date_1,
       min(date) filter (where running >= 2) as date_2    
from (select t.*, sum(num) over (order by date) as running
      from t
     ) t;

Or, if you want them on separate rows:

select distinct on (threshold) v.threshold, t.*
from (select t.*, sum(num) over (order by date) as running
      from t
     ) t cross join
     (values (1), (2)) v(threshold)
where running >= threshold and running - num < threshold
order by threshold, date 
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement